research report in urdu

متْن (اردو ریسرچ جرنل)

شش ماہی تحقیقی مجلّہ.

Biannual Double Blind Peer Reviewed Urdu Research Journal of Urdu Department, The Islamia University of Bahawalpur.

MATAN (متْن), Department of Urdu, IUB.

Creative Commons License

Status: approved

research report in urdu

Status: applied

research report in urdu

Starting Year: Feb. 2014

Articles Published: 404

Reviewed Articles: 27

Frequency: Quarterly

Language: Urdu

" Urdu Research Journal " is an open access refereed journal published quarterly. The Journal strives to publish work of high quality in research and literature works across the globe in Urdu language and literary theory. The aim of the journal is to provide high quality research material in Urdu for scholars and researchers.

EDITORIAL STAFF

Patron Prof. Ibne Kanwal

Chief Editor Dr. Uzair Israeel

Technical Assistant Nafees Ahmed

← ایسا کہا ں سے لاؤں کہ تجھ سا کہیں جسے

ڈاکٹر عزیر اسرائیل

Tags: اداریہ , پروفیسر ابن کنول

← قلمی چہرہ :پروفیسر ابن کنول

ڈاکٹر شفیع ایوب ، سی آئی ایل، جے این یو، نئی دہلی۔ 110067

Tags: پروفیسر ابن کنول

← منظوم خراج عقیدت

← کیوں؟ پروفیسر ابن کنول کے سانحہء ارتحال پر, ← ابنِ کنول صاحب (مثنوی کی ہئیت میں تعزیتی نظم).

ارشاد احمد ارشاد

← پہنچی وہیں پہ خاک جہاں کا خمیر تھا

متین امروہی

← رحلت ابن کنول (نظم بقید صنعت توشیح)

احمد امتیاز

← ابن کنول (خاص وضع قطع کا مخلص انسان)

صغیر افراہیم سابق صدر شعبہ اردو ، علی گڑھ مسلم یونی ورسٹی، علی گڑھ

← ہمدمِ دیرینہ – پروفیسر ابنِ کنول

ڈاکٹر صابر گودڑ

← منفی ماحول کا مثبت استعارہ: ابن کنول

پروفیسر محمد کاظم

← پیدا کہاں ہیں ایسے پراگندہ طبع لوگ

اکمل شاداب اسسٹنٹ پروفیسر شعبہ اردو، خواجہ معین الدین چشتی لسان یونی ورسٹی، کھنؤ

← آہ پروفیسر ابن کنول ۔۔۔۔۔دل کو کئی کہانیاں یاد سی آکے رہ گئیں

شبنم شمشاد اسسٹنٹ پروفیسر،شعبۂ اردو مانو،آرٹس اینڈ سائنس کالج فار وومن،سری نگر

← آتی رہے گی یاد ہمارے قابل ستائش استاد محترم ابن کنول

محمد جنید شکروی نائب پرنسپل آر بی جالان انٹر کالج دربھنگہ

← پروفیسرابن کنول:کچھ یادیں،کچھ باتیں

ڈاکٹرافضل مصباحی اسسٹنٹ پروفیسروسیکشن انچارج آف اردو ایم ایم وی، بنارس ہندویونیورسٹی، وارانسی، اترپردیش، بھارت

← زندہ رہتا ہے زمانے میں عمل اور کردار

ڈاکٹر ممتاز عالم رضوی مدیر اعلی روزنامہ قومی بھارت

← پروفیسر ابن کنول: ایک مشفق استاد کی باتیں اور یادیں

ڈاکٹر یامین انصاری ایڈیٹر، روزنامہ انقلاب، نوئیڈا، یوپی

← پروفیسر ابن کنول: ایک بے مثال استاد، لاثانی شخصیت

ڈاکٹر محمد شمس الدین اسسٹنٹ ڈائرکٹر، مولانا آزاد نیشنل اردو یونی ورسٹی اسٹدی سنٹر، بنارس، اترپردیش

← مخلص استاد پروفیسر ابن کنول             

ڈاکٹر سدھارتھ سدیپ اسسٹنٹ پروفیسر  شعبۂ اردو، خواجہ معین الدین چشتی لینگویج یونیورسٹی، لکھنؤ

← ابن کنول کی کہانیوں میں داستانوی اثرات

ڈاکٹر محمد ارشدندوی اسسٹنٹ پروفیسر(ایڈہاک )،شعبۂ اردو ، دیال سنگھ کا لج ،(دہلی یونیور سٹی ) لودھی روڈ،نئی دہلی ۳

← افسانہ’’ پہلا آدمی‘‘ ایک تجزیہ

عبید الرحمن نصیر ریسرچ اسکالر شعبہ اردو ،دہلی یونیورسٹی ،نئی دہلی(۱۱۰۰۰۷)

← ابن کنول کا افسانہ ’’بند راستے ‘‘کا تنقیدی مطالعہ

وجے کمار۔ریسرچ اسکالر شعبہ اردو جموں یونیورسٹی ،جموں و کشمیر

← بساط نشاط دل: ایک جائزہ

پروفیسر فاروق بخشی سابق صدر شعبہ اردو مولانا آزاد نیشنل اردو یونیورسٹی، حیدرآباد

← ابن کنول بحیثیت خاکہ نگار

شاہد اقبال ریسرچ اسکالر،دہلی یونیورسٹی،دہلی

← ’’کچھ شگفتگی کچھ سنجیدگی‘‘خاکوں کا گنجینۂ گوہر

ابراہیم افسر، میرٹھ، اترپردیش

← پروفیسرابن کنول کے سفر ناموں کا تجزیاتی مطالعہ

محمد یوسف ۔پی ۔ایچ ۔ڈی اسکالر بین الاقوامی اسلامی یونی ورسٹی اسلام آباد  پروفیسرڈاکٹر کامران عباس کاظمی صدر شعبہ اردو و فارسی بین الاقوامی اسلامی یونی ورسٹی اسلام آباد

← ابن کنول کا سفر نامہ چار کھونٹ

آفاق حیدر گیسٹ لیکچرر سریندر ناتھ کالج فار ویمن کولکاتا

← پروفیسر ابن کنول کی سفرنامہ نگاری

ڈاکٹرمحمد عامر،002-نرمدا ہاسٹل، جے این یو، نئی دہلی

← ابن کنول کا ڈرامہ ’’خواب‘‘: ایک  تنقیدی مطالعہ

ڈِمپلا دیوی ۔ ریسرچ اسکالر شعبہ اردو جموں یونیورسٹی

← داستانوی رنگ و آہنگ کا تخلیق کار:ابنِ کنول

پروفیسر آفتاب احمد آفاقی شعبۂ اردو ،بنارس ہندو یونیورسٹی، وارانسی

← پروفیسر ’’ابن کنول‘‘ اردو ادب کی روشنی میں

ڈاکٹر محمد طالب انصاری، ایسوسیٹ پروفیسر، کالج آف ایجوکیشن، مولانا آزاد نیشنل اردو یونی ورسٹی، حیدرآباد

← ابن کنول:   ادبی خدمات

ڈاکٹر عبدالرّحمٰن، ریختہ فاؤنڈیشن، نوئیڈا، اترپردیش

← ابن کنول:اردو ادب کا ایک روشن باب

تنویر احمد، ریسرچ اسکالرشعبہ اردو     دہلی یونیورسٹی،دہلی-۱۱۰۰۰۷

← پروفیسر ابن کنول : تعلیمی خیالات اور ادبی خدمات”

سونو رجک ریسرچ اسکالر مانو کالج آف ٹیچر ایجوکیشن دربھنگہ(بہار)

← ابن کنول کی شخصیت اور ادبی خدمات

ڈاکٹر محمد شاہد زیدی ، اسٹنٹ پروفیسر اردو گورنمنٹ  پی ۔ جی۔ کالج سوائی   مادھوپور (راجستھان )

Subscribe to our newsletter

research report in urdu

Vol.39,No.2(Dec 2023) has been Published

research report in urdu

EDITORIAL BOARD HAS BEEN RECONSTITUTED

research report in urdu

JOR (URDU) ACCEPTS ONLY INPAGE FORMAT

research report in urdu

CALL FOR PAPER(S) IS OPEN

research report in urdu

Subscribe JOR (Urdu) for your liberary

research report in urdu

Inauguration of the website

Editor's choice.

research report in urdu

اُردو کے تہذیبی معاشرے کا زوال (مہمان اداریہ)

  • Dr. Athar Farouqui /
  • December 31, 2023

اُردو غزلیات میں فارسی اَدبیات سے ماخوذ تلمیحات سے اِستفادے کا رُجحان :مختصر جائزہ

  • Muhammad Mohsin Khalid /

Useful links

research report in urdu

AUTHOR GUIDELINES

research report in urdu

EDITORIAL BOARD

research report in urdu

Citation Style

research report in urdu

Current Issue

research report in urdu

Advisory Board

research report in urdu

HEC Recognized Journals

Disclaimer .

  • Journal of Research (Urdu)

Page activity

We will keep fighting for all libraries - stand with us!

Internet Archive Audio

research report in urdu

  • This Just In
  • Grateful Dead
  • Old Time Radio
  • 78 RPMs and Cylinder Recordings
  • Audio Books & Poetry
  • Computers, Technology and Science
  • Music, Arts & Culture
  • News & Public Affairs
  • Spirituality & Religion
  • Radio News Archive

research report in urdu

  • Flickr Commons
  • Occupy Wall Street Flickr
  • NASA Images
  • Solar System Collection
  • Ames Research Center

research report in urdu

  • All Software
  • Old School Emulation
  • MS-DOS Games
  • Historical Software
  • Classic PC Games
  • Software Library
  • Kodi Archive and Support File
  • Vintage Software
  • CD-ROM Software
  • CD-ROM Software Library
  • Software Sites
  • Tucows Software Library
  • Shareware CD-ROMs
  • Software Capsules Compilation
  • CD-ROM Images
  • ZX Spectrum
  • DOOM Level CD

research report in urdu

  • Smithsonian Libraries
  • FEDLINK (US)
  • Lincoln Collection
  • American Libraries
  • Canadian Libraries
  • Universal Library
  • Project Gutenberg
  • Children's Library
  • Biodiversity Heritage Library
  • Books by Language
  • Additional Collections

research report in urdu

  • Prelinger Archives
  • Democracy Now!
  • Occupy Wall Street
  • TV NSA Clip Library
  • Animation & Cartoons
  • Arts & Music
  • Computers & Technology
  • Cultural & Academic Films
  • Ephemeral Films
  • Sports Videos
  • Videogame Videos
  • Youth Media

Search the history of over 866 billion web pages on the Internet.

Mobile Apps

  • Wayback Machine (iOS)
  • Wayback Machine (Android)

Browser Extensions

Archive-it subscription.

  • Explore the Collections
  • Build Collections

Save Page Now

Capture a web page as it appears now for use as a trusted citation in the future.

Please enter a valid web address

  • Donate Donate icon An illustration of a heart shape

Urdu How To Write A Research Paper Or Thesis مقالہ کیسے لکھیں؟) - آسٹریلین اسلامک لائیبریری) || Australian Islamic Library

Bookreader item preview, share or embed this item, flag this item for.

  • Graphic Violence
  • Explicit Sexual Content
  • Hate Speech
  • Misinformation/Disinformation
  • Marketing/Phishing/Advertising
  • Misleading/Inaccurate/Missing Metadata

plus-circle Add Review comment Reviews

8,636 Views

DOWNLOAD OPTIONS

For users with print-disabilities

IN COLLECTIONS

Uploaded by nabeel.musharraf on June 26, 2016

SIMILAR ITEMS (based on metadata)

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 12 December 2023

A hybrid dependency-based approach for Urdu sentiment analysis

  • Urooba Sehar 1 ,
  • Summrina Kanwal 2 , 3 ,
  • Nasser I. Allheeib 4 ,
  • Sultan Almari 5 ,
  • Faiza Khan 6 ,
  • Kia Dashtipur 7 ,
  • Mandar Gogate 7 &
  • Osama A. Khashan 8  

Scientific Reports volume  13 , Article number:  22075 ( 2023 ) Cite this article

649 Accesses

Metrics details

  • Computer science
  • Information technology

In the digital age, social media has emerged as a significant platform, generating a vast amount of raw data daily. This data reflects the opinions of individuals from diverse backgrounds, races, cultures, and age groups, spanning a wide range of topics. Businesses can leverage this data to extract valuable insights, improve their services, and effectively reach a broader audience based on users’ expressed opinions on social media platforms. To harness the potential of this extensive and unstructured data, a deep understanding of Natural Language Processing (NLP) is crucial. Existing approaches for sentiment analysis (SA) often rely on word co-occurrence frequencies, which prove inefficient in practical scenarios. Identifying this research gap, this paper presents a framework for concept-level sentiment analysis, aiming to enhance the accuracy of sentiment analysis (SA). A comprehensive Urdu language dataset was constructed by collecting data from YouTube, consisting of various talks and reviews on topics such as movies, politics, and commercial products. The dataset was further enriched by incorporating language rules and Deep Neural Networks (DNN) to optimize polarity detection. For sentiment analysis, the proposed framework employs predefined rules to trigger sentiment flow from words to concepts, leveraging the dependency relations among different words in a sentence based on Urdu language grammatical rules. In cases where predefined patterns are not triggered, the framework seamlessly switches to its sub-symbolic counterpart, passing the data to the DNN for sentence classification. Experimental results demonstrate that the proposed framework surpasses state-of-the-art approaches, including LSTM, CNN, SVM, LR, and MLP, achieving an improvement of 6–7% on Urdu dataset. In conclusion, this research paper introduces a novel framework for concept-level sentiment analysis of Urdu language data sourced from social media platforms. By combining language rules and DNN, the proposed framework demonstrates superior performance compared to existing methodologies, showcasing its effectiveness in accurately analyzing sentiment in Urdu text data.

Similar content being viewed by others

research report in urdu

Multi-class sentiment analysis of urdu text using multilingual BERT

research report in urdu

Sentence-level sentiment analysis based on supervised gradual machine learning

research report in urdu

Character gated recurrent neural networks for Arabic sentiment analysis

Introduction.

In this age of social media and networks, information spread online influences our choices ranging from selecting a movie to watch to daily purchases such as groceries, clothing, and so on, as well as purchasing services related to health and business. It is now common practice to review products and services online after using and purchasing them or after using a business group’s services. E-commerce sites also encourage their users to comment and review about their products after purchase and use so that those reviews can be used later to improve the product quality or the quality of the services being provided, and also to introduce new products based on the needs of the users. Businesses are using social media analytics to improve their services and products according to the choices and opinions of their target audience through different platforms.

Since the amount of data shared on online social platforms is so large and requires a time-consuming process for pre-processing the raw data to extract the required information, it is exceptionally difficult to interpret it without the assistance of an intelligent automated system. If such systems are not built, the data produced in billions and trillions of dollars will be wasted without being used to help businesses and enterprises improve their services, make them better follow the needs of the users, and improve the quality of the products or services they provide to their target users.

SA is the process of analysing raw data containing user opinions and reviews about a variety of products and services in order to automatically interpret the polarity of the user's opinions shared on social media platforms, as well as in the form of reviews, comments on e-commerce websites, and by bloggers on blogging sites 1 , 2 .

Due to the diversity of the languages spoken by people worldwide, SA is a challenging area of research. Technically, the generalisation of proposed models from one language to another is not possible. Even if a model developed for one language is applied to another, significant pre-processing is necessary to reap the benefits. Urdu, as a resource-constrained language, has a long way to go in the field of SA before it can showcase efficient and intelligent SA models. Urdu SA is still in its infancy 3 . Currently, available approaches to SA give scant consideration to dependency-based grammar-based rules. As illustrated in Fig.  1 , the majority of models determine the polarity of a complete sentence using the word's co-occurrence polarity, which occasionally fails to correctly classify a sentence according to its grammatical context and the interdependence of different words, which affects the sentence's overall polarity.

figure 1

A demonstration of how traditional approaches assign polarity to different words is further used for assigning polarity to a sentence.

Dependency grammar rules are built on patterns of a language that enable the sentiment to move from words to that of the concept relying on the dependency relation between them 4 .

Hence dependency-based rules are considerate of hierarchical relations among different keywords or conjunctions interjections, in order of occurrence of words and polarities of the individual words for a more accurate determination of the underlying polarity of a sentence 5 .

The main objective of this research work is to optimise the detection of sentiment polarity in Urdu sentences containing reviews or opinions on movies, products, and politics by combining Urdu grammar rules with various machine learning (ML) and deep neural network (DNN) models. In this study, a niche framework has been proposed, integrating dependency grammar-based rules for the Urdu language with DL models for the exploration of the Urdu language dataset. The significant contribution of this research paper is as follows:

Use of grammatical dependency-based rules for Urdu SA.

A framework that is able to figure out the polarity of Urdu sentences classified based on the individual polarity of words and their correlation and arrangement that is in the rules of Urdu grammar in order to provide a better classification in comparison to the state-of-the-art polarity classification models.

A comprehensive discussion of different grammatical rules of the Urdu language and how they impact the polarity of sentences.

The utilisation of the developed Urdu language datasets: movie reviews, political reviews, and product reviews.

Experimentation by integrating the Urdu language's dependence on grammar rules using models such as Support Vector Machine (SVM), Logistic Regression (LR), and Deep Neural Network (DNN) models such as long short-term memory (LSTM), and Convolutional Neural Networks (CNN).

The structure of this research paper is as follows: section “ Literature review ” describes the literature review of prior studies on SA approaches, their shortcomings, and challenges. Next, section “ Methodology ” describes the research methodology and design of proposed study, including the process for collecting the data and proposed framework. Section “ A dependency rules-based Sa framework ” includes a result and discussion, and comparison of the proposed approach for SA. Finally, section “ Dataset ” summarises the paper with the conclusion and discusses future endeavours.

Literature review

In this section, the current research on SA and the use of ML, AI, and DNN in this field has been summarised, along with the research gaps that need to be addressed to make SA systems and approaches more effective and usable by practitioners in various fields.

Subramanian et al. 6 developed a SA model based on sequence-based Neural Networks, specifically using a CNN-LSTM approach on the IMDB movie review dataset. Alsayat et al. 7 introduced an ensemble deep learning language model to enhance sentiment analysis in social media applications. By conducting experiments using various datasets, including Twitter's coronavirus hashtag dataset and public review datasets from Amazon and Yelp, they demonstrated that their proposed models outperform other models in terms of classification accuracy. Aljameel et al. 8 introduced an SA approach for predicting public awareness of COVID-19 prevention measures in Saudi Arabia, using SVM, KNN, NB, and N-gram feature extraction. SVM with bigram in TF-IDF outperformed other models. Rao et al. 9 utilized multilevel features and a MFCNN model, combining multiple CNN features, to classify English text sentiment, outperforming a conventional CNN model. Yue et al. 10 proposed a task-oriented, granularity-oriented, and methodology-oriented SA approach for English social media sites. Prottasha et al. 11 examined the utilization of transfer learning via BERT-based supervised fine-tuning for sentiment analysis (SA). Their findings reveal that incorporating transfer learning and BERT in SA tasks surpasses alternative embedding techniques and algorithms, demonstrating superior performance. Ashir et al. 12 experimented with SVM, NB, MLP, AdaBoost, and LR classifiers on movie reviews and Twitter samples, reporting accuracy rates of 72% and 91.1%, respectively.

Grammatical rules differ between languages, as does the quality of data and pre-processed data available in that language. Dashtipour et al. 5 researched SA of hotel reviews in Persian, achieving high accuracy using a hybrid model that combines LSTM with dependency-based grammatical rules. Miranda et al. 13 conducted a comprehensive study on SA in Spanish, focusing on document-level SA. Can et al. 14 investigated language-generalized SA models, proposing an RNN-based technique for different languages, including resource-constrained languages. Chen et al. 15 proposed a lexicon-based approach for SA of Chinese social media posts, and developing a comprehensive process and lexicon algorithm in their study. Poria et al. 16 proposed a multimodal SA classification approach utilizing deep learning algorithms and discussing challenges in multimodal SA research. Zadeh et al. 17 presented a framework based on tensor fusion techniques for multimodal SA, achieving high accuracy for textual, visual, and acoustic modalities. Rosas et al. 18 also presented a method for multimodal SA classification that can possibly be used to determine the sentiments expressed in visual data streams at the utterance level. The results of their experiments on the Multimodal Opinion Utterances Dataset (MOUD) indicate that utterance-level sentiment classification achieved a 74.09 100% accuracy across multiple modalities, including Linguistic, Acoustic, and Visual models. Recently, Li et al. 19 have suggested a novel SA classifier that combines a two-channel classifier with a neural tensor block. They tested their proposed model on three different standard datasets. The BiERU-lc model achieved a weighted average accuracy of 0.74% and an f1 score of 0.45% in their experimental study using IEMOCAP datasets. Chakravarthi et al. 20 developed a dataset for SA that includes comments in three Dravidian languages: Tamil, Kannada, and Malayalam English. Their dataset was compiled from user comments on various social media platforms, including YouTube. The study's results showed a weighted average accuracy of 0.68%. Kazmaier et al . 21 introduced various techniques for heterogeneous ensembles for SA in their study and analysed results via experimentation on their dataset. Additionally, they developed a novel model for SA based on ensemble learning of multiple SA approaches. The study's findings indicate that the proposed ensemble technique improved the results of SA on the Twitter data set by approximately 5.53% and for the Yelp data set by 0.43%. Aniello et al. 22 proposed an aspect-based reference SA model and suggested tools for quantifying opinions and sentiments within sentences.

Social media comments and reviews are being analysed to see how SA can affect businesses Cruz et al. 23 proposed a model to study the impact of financial accounts on stock market decision-making. Wang et al. 24 investigated the impact of SA models on fundraising campaigns and the growth of Internet finance. Bueno et al. 25 proposed a model for SA that makes decisions based on the business context. Aziz et al. 26 proposed a method for SA of reviews and comments on Roman Urdu eCommerce websites. They created a dataset containing 21,000 records with the assistance of a Kaggle dataset. They conducted experiments on a variety of machine learning and deep neural network-based models and compared them to their proposed approach. The results of their experiment study indicate that their model achieved an accuracy of 82.19% when Sentiment classification was estimated using RANSAC (random sample Consensus). Mukhtar et al. 27 proposed a model for SA in Urdu using a lexicon. Chandio et al. 28 developed an SVM-based model for SA of Roman Urdu-based eCommerce reviews, reporting accuracy with their created Urdu dataset. Khan et al. 29 utilized ML and DNN models to analyse multimodal sentiment in Urdu, with linear regression (LR) outperforming other models. Qureshi et al. 30 proposed an SA model for Roman Urdu reviews, achieving high accuracy using deep neural network techniques and logistic regression. In previous research 31 DL for multimodal SA of Urdu was used, achieving high accuracy for polarity prediction.

Li et al. 36 reported an accuracy of 0.91% when employing CNN with an attention layer and transfer learning for SA on a dataset of roman-urdu texts. Using rule-based machine learning like support vector machine, Naive Bayes, Ada boost, Multilayer Perceptron, Linear Regression and Random Forest and deep learning algorithms like Convolutional Neural Network, Long short-term memory, Bidirectional- Long short-term memory, Gated recurrent units (GRUs), and Bidirectional-GRU), Khan et al. 37 achieved an F1 score of 81.49%. A study was undertaken by Rehman and Soomro 38 to analyze the sentiment of Urdu messages obtained from the popular social media platform Twitter. Experiments were conducted by the researchers utilizing various machine learning algorithms within the WEKA platform. It was determined that the SMO algorithm exhibited superior performance in sentiment analysis of tweets written in Urdu (Nastaleeq), while the Random Forest approach produced the most favorable outcomes when applied to Roman Urdu text. Chandio et al. 39 conducted an experiment in their investigation employing RU-BiLSTM, a deep recurrent architecture. This BiLSTM-based architecture includes both word embedding and an attention mechanism. Their investigation was designed to examine the sentiment expressed in Roman Urdu. The experimental procedures executed by the researchers utilizing two datasets of Roman Urdu yielded positive results. Khan et al. 40 put forth a novel deep learning framework designed for the purpose of sentiment analysis in Roman Urdu and English dialects. This architecture consists of two layers: a Long Short-Term Memory (LSTM) layer for preserving long-term dependencies and a one-layer Convolutional Neural Network (CNN) model for extracting local features. Multiple machine learning classifiers are provided with the feature maps obtained by the Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) models so that the highest level of classification can be attained. The evaluated accuracies of these classifiers against the MDPI, RUSA, RUSA-19, and UCL datasets are 0.904, 0.841, 0.740, and 0.748, respectively. The results suggest that for sentiment analysis in Roman Urdu, the Word2Vec CBOW model and the SVM classifier produce more favourable results. On the contrary, for sentiment analysis specifically targeting the English language, the BERT word embedding, two-layer LSTM, and SVM as a classifier function are considered to be more suitable alternatives. Ahmed et al. 41 presented the meta-learning ensemble approach in their research, which sought to incorporate deep learning and foundational machine learning models for the Urdu language. The execution of this approach involved the utilization of two levels of meta-classifiers. The ensemble method under consideration integrates the predictions produced by the inter-committee and intra-committee classifiers at two distinct levels. By implementing the suggested technique, the classification accuracy of the baseline deep models is significantly improved, as shown by the results.

In their research, Altaf et al. 42 employed linguistic variables that are unique to the Urdu language to analyze sentiment at the sentence level. Furthermore, conventional machine learning methodologies were utilized in order to categorize idioms and proverbs. For this objective, the researchers employed a dataset that they had curated. The experimental results indicate that the J48 classifier exhibits a higher level of proficiency in sentiment classification, as evidenced by its 90% accuracy rate and 88% F-measure. Bashir et al. 43 presented the Urdu Nastalique Emotions Dataset (UNED), an assortment of annotated phrases and paragraphs representing diverse emotions. Additionally, the authors put forth a deep learning (DL) methodology that successfully classified six unique categories of emotions present in the UNED corpus. The results of the experiments indicate that the DL-based model outperforms generic machine learning approaches, as evidenced by its F1 score of 85% on the UNED sentence-based corpus and 50% on the UNED paragraph-based corpus. Khan et al. 44 this research paper introduces a novel framework that capitalizes on the Cognitive Relationship (CR) between sarcasm and sentiment in order to enhance classification precision.

The dataset compiled by the researchers comprised 7000 tweets composed in standard URU language. Furthermore, experiments were conducted employing a CR-based methodology to classify sarcasm and emotion. Based on their research outcomes, it was concluded that eXtreme Gradient Boosting and Linear Regression exhibit superior performance. The implementation of CR has resulted in a significant improvement of 9.3% in sentiment classification when compared to the stand-alone (SA) method. Furthermore, it has consistently increased by approximately 22% in comparison to the distribution at the outset. Likewise, the implementation of CR for the classification of sarcasm has demonstrated a significant increase of 9.1% in comparison to the SA method, and a considerable enhancement of approximately 23.6% over the initial distribution.

Despite recent advancements, there is a research gap in SA for resource-poor languages like Urdu, particularly in concept-level SA. This research aims to address this gap in Urdu linguistics research.

Methodology

In this section, the process of analysing dependency-based rules for Urdu SA has been summarised, as depicted in Fig.  2 .

figure 2

Classification of a sentence polarity based on dependency-based rule example.

Identify Urdu grammar rules

Previous research lacked effective SA due to a failure to consider language rules when assigning polarity to a sentence and instead focusing exclusively on the polarity of individual words. For example, in the sentence demonstrated in Fig.  3 , “ یہ موبائل بہتر ہے، اس کے سوا تمام موبال ٹیھک کام نہیں کرتے ” (This mobile is better, rest of them does not work well), looking at the polarity of the words of the sentence due to presence of a word with negative polarity such as ‘نہیں' it seems like the speaker of the sentence has a negative opinion about all mobiles. If this sentence is analysed by a state-of-the-art approach to classifying the polarity of the sentiment, it is possible that this sentence would be categorised as a sentence with negative polarity without considering the context of the sentence. There is also the possibility of having a conflict in the decision of the model as there is also a word with a positive polarity that is “بہتر” “Better”. In such cases, it is not possible to correctly identify the overall polarity of the sentence. Alternatively, considering the grammatical context of the sentence and trying to analyse the real meaning of the sentence keeping in view the dependency-based rules of Urdu grammar, it is a sentence with positive polarity because of the use of the word "سوا”(except). Whenever 'Except' is used in a sentence, it means that word is used for separation in two clauses with mostly opposite polarity. In such cases, the polarity for a single entity that is an exception from another is found in one clause that is before the exception word, and the polarity of the group can be found in the other. So, in our example, the first clause with positive polarity is the deciding factor of the polarity, which could only be possibly identified by the grammatical dependency-based rules of Urdu Grammar.

figure 3

An example of polarity classification of our proposed grammatical rules-based classification technique.

Consider the relationship between the words in the sentence “ یہ فلم قدیم ہے، لیکن بری نہیں ہے ” (This film is old, but it is not bad). Due to the presence of three words with negative polarity in this sentence, a traditional model will classify it as a sentence with negative polarity, as illustrated in Fig.  4 . However, this sentence has a positive polarity in reality due to the presence of the word "But," which shifts the sentence's overall polarity to the positive.

figure 4

An example of polarity classification of a traditional word-based classification technique.

The same sentence would be classified as a positive sentence based on grammatical dependency rules because the word ‘but’ is used to negate the polarity of the first part in the second part. In current case, the negative polarity of ‘This film is old’ is negated by the use of ‘But’ in the sentence. As shown in Fig.  5 , two negative polarity words “and” cancel each other out, making the overall polarity of the second part of the sentence positive. Thus, a sentence has positive polarity, which is missed by traditional classification.

figure 5

As demonstrated by the preceding two examples, it is critical to understand the grammatical context of the language in order to identify the sentence's polarity correctly. The grammatical rules in the Urdu language have been identified that can alter the sentence's polarity in order to construct a model based on those grammatical rules. This research demonstrated how the proposed approach is capable of correctly classifying sentences that cannot be classified using any conventional sentiment classification technique. The following section identifies the various grammatical dependency-based rules that our proposed model for Urdu SA employs.

URDU grammer rules

As illustrated in Fig.  6 , several grammatical rules have been identified that contribute to a sentence's polarity alteration. This section also discusses the grammatical rules in detail, when they are triggered, and how the polarity is determined in the event of a trigger.

figure 6

List of Urdu grammar-based rules and their trigger words and events.

Trigger : A sentence containing one or more negation words like 'نہیں'، 'مت'.

Action : Overall polarity of the sentence is changed based on the sentence or concept with which negation is being used. If a negative concept is negated, then the polarity of the sentence is positive, and if a positive token is negated, then the polarity of the sentence is negative. For example, یہ کتاب مجھے پسند نہیں ہے, ‘I don’t like this book’ has negative polarity. On the other hand, ‘میں نے یہ کتاب پڑھی ہے، اسے خرید نے سے کترایئے مت۔’, has overall positive polarity.

Continuing clause (حروف وصل)

Trigger : When two sentences of the same weight are connected and have an opinion about the same thing. The word 'and' 'اور' usually connect two sentences having a continuing relationship.

Action : If one of the sentences has positive polarity, the other part of the sentence also has positive polarity, resulting in a positive overall polarity for that sentence. If one of the sentences has negative polarity, the other part of the sentence also has negative polarity, resulting in a negative overall polarity for that sentence. For example, this mobile has low battery timing, and its camera is also not good. 'اس موبائل کی بیٹری کی میعاد کم ہے اور اس کا کیمرہ بھی اچھا نہیں ہے' so this sentence has overall negative polarity.

Complement clause

Trigger : If a sentence contains ‘that’, ‘کہ’.

Action : A sentence containing ‘کہ’, and ‘that’ is divided into two parts, and the polarity of the first part is considered the overall polarity of the sentence. For example: Good thing about Samsung mobile is that it has good camera and battery timing, ‘سامسنگ موبائل کی اچھی بات یہ ہے کہ اس کا کیمرہ اور بیٹری کا معیادِاستعمال اچھی ہے’.

Exception clause

Trigger : When a sentence has an expectation word like ‘سوا', which segregates an object from a group of objects.

Action : In cases when two clauses have an opinion about a group of objects and an exception clause is used for separation in two clauses. In such cases, the polarity for a single entity that is an exception from another is found in one clause that is before the exception word, and polarity of the group can be found in the other. For example, Except for this mobile phone, which is better, all other mobiles do not work properly. 'یہ موبائل بہتر ہے، اس کے سوا تمام موبال ٹیھک کام نہیں کرتے''. In this sentence first clause has a positive polarity and the other clause has negative polarity. Overall polarity usually depends on the polarity of the first clause.

Action and reason clause

Trigger : If a sentence has حرف علت reason/cause related word like ‘because’, 'کیونکہ', 'اسلیے ' in it. The sentences that contain opinion/compliment anything in the first clause and then second clause starting with a word like 'کیونکہ' has an explanation of the reason for opinion or complement.

Action : Polarity in the case of the action and reason clause is determined with respect to the polarity of the first clause as it explains opinion about anything in the first part and then gives a reason for that in the second clause of the sentence. For example: 'مجھے یہ کرسی پسند ہے کیونکہ یہ مضبوط ہے' I like this chair because this is durable. The overall polarity of this sentence is positive, which is extracted from the polarity of the first clause of the sentence.

Proposition Clause (حروف جار).

Disagreement clause (حروف استدراک)

Trigger : When a sentence connects two clauses of different polarity with the word ‘but’, ‘مگر،لیکن’.

Action : The first part of the sentence has some disagreement which is then clarified in the second part of that sentence which is after ‘مگر،لیکن’۔ So if the first part of the sentence has negative polarity, the second part would have clarification of the disagreement in it and would have positive polarity. The overall polarity of the sentence having a disagreement clause is in the second clause that is after the word 'مگر،لیکن'. So, if the sentence has negative polarity in the second clause, the sentence has negative polarity. On the other hand, if the second clause has positive polarity sentence has positive polarity. For example, this book is expensive, but I like the quality of the book, 'یہ کتاب مہنگی ہے مگر مجھے اس کا معیار پسند ہے', as the second clause has positive polarity, so the polarity of the sentence is positive.

Comparison clause

Trigger : When the sentence has words like ‘باوجود’۔, It explains something in comparison to an attribute of that object.

Action : Sentence with comparison to an attribute of an object has polarity based on clause after comparison word. For example: Despite high prices, the quality of this mobile is very low, ' زیادہ قیمت کی باوجود اس موبائل کا معیار بہت کم ہے' ۔ This sentence has negative polarity as clause after 'باوجود has negative polarity.

A dependency rules-based Sa framework

Here, the grammatical dependency rules for Urdu were combined with ML models, such as SVM, LR, and DNN models like LSTM and CNN. The primary goal of this integrated approach is to accurately classify Urdu sentences whose polarity or sentiment cannot be effectively determined using conventional word-based methods that solely rely on positive or negative words. By incorporating the grammatical dependency rules, which capture the interdependencies and relationships between words within a sentence, into the ML models, this research aimed to enhance the sentiment analysis process. This integration enables the framework to capture subtle nuances in sentiment that may go unnoticed by traditional word-based techniques. The central focus of this approach is to accurately classify sentences that demonstrate complex sentiment patterns, where determining polarity solely based on individual positive or negative words is challenging. By combining the linguistic knowledge embedded in the grammatical dependency rules with the predictive power of the ML models, the framework becomes more proficient in handling these intricate cases effectively. The steps of the proposed hybrid framework are depicted in Fig.  7 and are discussed here.

figure 7

Our proposed research methodology model for SA is based on Urdu grammatical dependency-based rules model.

Data preprocessing

Tokenisation and normalisation techniques are used to pre-process the corpus. The sentences were stripped of numbers and punctuation. The sentences were already manually tagged while creating the dataset, and the polarity per word was refined further, with zero polarity assigned to words that didn’t appear in the lexicon. In the end, a dependency tree was generated to identify the dependency tree for a sentence. All of this was done with the urduhack python package for the Urdu language 32 and 33 . The recommended dependency-based rules classifier is fed the dependency tree and assigned polarities. The presented classifier is fed a dependency tree as well as the assigned polarities.

Polarity prediction algorithm

To classify unseen sentences, the proposed framework incorporates the language’s dependency-based rules into the deep learning architecture. Below is the pseudocode for the proposed method:

figure a

Algorithm: Polarity Prediction

Long short-term memory (LSTM)

As illustrated in Fig.  8 , the proposed LSTM configuration includes input layers from which parsed Urdu sentences are passed to the model. The following two layers of the model are stacked bidirectional LSTM layers with 128 and 64 cells, respectively. Following these layers are a dropout layer and a dense layer with two neurons and softmax activation. The model’s final layer is a completely connected output layer that determines the polarity of the sentences passed to it from the input layer 5 .

figure 8

BI-LSTM Deep Learning Model for classification of sentiments from Urdu sentences of the dataset.

Convolutional neural network (CNN)

The CNN model that was used in this experimental study is depicted in Fig.  9 . The model was trained using grammatical rules for detecting polarity in the form of negative or positive reviews of people from a set of reviews on films, products, and politics.

figure 9

CNN DL Model for classification of sentiments from Urdu sentences of the dataset.

It is common for the rule-based approach to use positive polarity to classify sentences when word polarity is not available due to the small Urdu lexicon. SVM, LR, and MLP classifiers have also been used as a baseline to compare the performance of proposed approach. In order to train and validate the DNN architectures, the TensorFlow library and Google Colab Pro GPU were used. Backpropagation has been utilised for training the models for 100 epochs, and the Adam optimiser was used to minimise the categorical cross-entropy loss. As part of the hybrid framework, the rule-based approach's unclassified sentences were transformed into 200-dimensional fastText word embedding and fed into deep learning classifiers.

For previous dataset 17 , the chosen textual contents generated through video transcription was utilized. The dataset consisted of speakers aged between 20 and 40 years and included videos with an average duration ranging from 3 to 8 min. It was categorized into three distinct genres: film reviews, political commentary, and product reviews. The training set, which comprised 70% of the data, was used to train the models, while the test set, accounting for 30% of the dataset, was employed for evaluating and reporting the results.

Urdu movie review dataset

The Urdu movie review dataset contains 3000 reviews provided by various users, covering a wide range of films. It consists of 15,000 positive reviews and 15,000 negative reviews.

Political review dataset

The political review dataset comprises 4000 reviews, including 2000 positive reviews and 2000 negative reviews.

Product review dataset

The Urdu product review dataset comprises 2000 reviews from different users, with 1000 positive reviews and 1,000 negative reviews.

Availability of data and materials

The dataset was publicly available on GitHub ( https://github.com/uroobasehar/datasethybriddependencybasedmodel ) for researchers to utilize in further experiments related to Urdu sentiment analysis models 32 .

Results and analysis

Three datasets were used for conducting the experiments. The results of both hybrid models and the LSTM and CNN models are summarised in Table 1 , along with comparisons to other models and techniques proposed by various researchers in the literature. An accuracy of approximately 74.69% using SVM was obtained, while the precision, recall, and F measures were 0.74, 0.73, and 0.74, respectively. On the movies review dataset, an accuracy of 72.53% was obtained using an LR model, with precision (P), recall (R), and f-measure (F) values of 0.72, 0.71, and 0.72, respectively. Similarly, MLP alone provided an accuracy of approximately 73.92%, as well as precision, recall, and f-measure values of approximately 0.73,0.72, and 0.73, respectively. When the proposed dependency-based rules are applied, a significant improvement was observed in the accuracy of classifying Urdu sentences from the movie reviews dataset. As illustrated in Table 1 , When dependency-based rules are used alone, accuracy improves by approximately 6–7%, as an accuracy of approximately 80.56 percent was acquired along with P, R, and F values of approximately 0.80, 0.79, and 0.80, respectively. As a result of the experiments, a noticeable improvement in classification accuracy was observed when using DNN models such as CNN and LSTM.

Both the hybrid models, a combination of LSTM with dependency-based rules and a combination of CNN with dependency-based rules, have shown an improvement in accuracy of about 15–17% from the state-of-the-art models. In comparison to both hybrid approaches, hybrid 2, which is a combination of LSTM and dependency-based rules, performed best among all other models by achieving an accuracy of 89.75% and P, R, and F of 0.89, 0.88, and 0.89, respectively.

An ablation study was also performed in order to know the way each part works in isolation. Tables 2 , 3 , and 4 report the outcome of ablation research on the movie, hotel as well as product review corpora, respectively. Experimental results show that the exceptional clause achieved better accuracy in all review datasets in comparison to various other rules. The disagreement clause achieved the lowest performance compared to other rules.

Experiments on the political review dataset are reported in Table 4 . Hybrid models outperformed all other approaches in terms of accuracy, with a score of 93.05%, P, R, and F of 0.93, 0.92, and 0.93, respectively. Similarly, for the product review dataset, hybrid models outperform the other models (Table 4 ).

Table 5 summarises the results of the experiments carried out to compute polarity using the proposed hybrid models for various sentences from the datasets. It can be seen that complex sentences with multiple clauses and phrases that have different polarities due to grammatical aspects hidden within those sentences are correctly classified. It is because of this that the hybrid approach takes into account the language's dependency rule.

Figure  10 , 11 , and 12 demonstrate the evolution of learning curves, which provide insight into the behaviour of various models. The learning curve is smoothing out over time.

figure 10

Train and validation loss for MLP Model over 100 epochs.

figure 11

Train and validation loss for LSTM Model over 100 epochs.

figure 12

Train and validation loss for CNN Model over 100 epochs.

Conclusion and future work

Digital media, as an integral part of our daily lives, plays a crucial role in the distribution and generation of massive amounts of data daily, containing the perspectives of diverse people from diverse regions of the world on a variety of subjects and issues. Reviewing products and services and leaving comments on items sold on e-commerce sites has become a widespread trend that almost everyone is now following. With this deluge of data generated daily, the need for data processing and analysis becomes apparent in order to leverage the data to enhance product and service quality. Over the last decade, researchers have actively contributed to the body of knowledge regarding SA in a variety of languages spoken by people worldwide. Urdu SA continues to require researchers' attention in order to develop effective and efficient models for detecting the polarity of sentiments expressed in Urdu sentences shared by people on the internet about various products and services they use in their daily lives. In this study, we propose a hybrid framework for detecting the polarity of sentiments in Urdu using multiple deep neural network approaches and dependency-based Urdu language grammatical rules. This work is a continuation of previous work 8 , in which multimodal SA was used. Three distinct datasets were used in these experiments: movie reviews, product reviews, and political reviews. Results were reported using SVM, Logistic Regression, Multilayer Perceptron (MLP), and Decision Tree (DL) models, as well as DL models combined with dependency-based rules for improved prediction. Experimental results demonstrate that the proposed hybrid approach outperforms state-of-the-art SA methods by nearly 10%.

In the future, it is recommended to address the issue of unclassified sentences by expanding our lexicon and to investigate the generalisation capability of hybrid framework by utilising additional challenging corpora from a variety of different applications, including emotion-sensitive companions. It is intended to optimise the prediction model by using the hyperparameter optimisation technique suggested in 35 . Further, it is intended to investigate multimodal datasets with language dependency rules.

Data availability

Kumar, A., Srinivasan, K., Cheng, W.-H. & Zomaya, A. Y. Hybrid context enriched deep learning model for fine-grained sentiment analysis in textual and visual semiotic modality social data. Inf. Process. Manage. 57 (1), 102141. https://doi.org/10.1016/j.ipm.2019.102141 (2020).

Article   Google Scholar  

Nawaz, A. et al . Extractive Text Summarization Models for Urdu Language. https://www.semanticscholar.org/paper/Extractive-Text-Summarization-Models-for-Urdu-Nawaz-Bakhtyar/f8ab2a156ab465b1c550082710b2286a7d593d5e (2020).

D’Orazio, M., Di Giuseppe, E. & Bernardini, G. Automatic detection of maintenance requests: Comparison of human manual annotation and sentiment analysis techniques. Autom. Constr. 134 , 104068 (2022).

Peng, H., Cambria, E. & Hussain, A. A review of sentiment analysis research in Chinese Language. Cogn. Comput. 9 (4), 423–435. https://doi.org/10.1007/s12559-017-9470-8 (2017).

Dashtipour, K. et al . A hybrid persian sentiment analysis framework: Integrating dependency grammar based rules and deep neural networks. . https://www.semanticscholar.org/paper/A-Hybrid-Persian-Sentiment-Analysis-Framework%3A-and-Dashtipour-Gogate/011deb3758ab35af25a4cee4726c0d6acfeb4941 (2020).

Subramanian, R. R. et al . A Survey on Sentiment Analysis. In 2021 11th International Conference on Cloud Computing, Data Science Engineering (Confluence) 70–75 (2021). https://doi.org/10.1109/Confluence51648.2021.9377136 .

Alsayat, A. Improving sentiment analysis for social media applications using an ensemble deep learning language model. Arab. J. Sci. Eng. 2021 , 1–13 (2021).

Google Scholar  

Aljameel, S. S. et al. A sentiment analysis approach to predict an individual’s awareness of the precautionary procedures to prevent COVID-19 outbreaks in Saudi Arabia. Int. J. Env. Res. Public Health 18 (1), 1. https://doi.org/10.3390/ijerph18010218 (2021).

Rao, L. Sentiment analysis of english text with multilevel features. Sci. Program. 2022 , e7605125. https://doi.org/10.1155/2022/7605125 (2022).

Yue, L., Chen, W., Li, X., Zuo, W. & Yin, M. A survey of sentiment analysis in social media. Knowl. Inf. Syst. 60 (2), 617–663. https://doi.org/10.1007/s10115-018-1236-4 (2019).

Prottasha, N. J. et al. Transfer learning for sentiment analysis using BERT based supervised fine-tuning. Sensors 22 , 4157 (2022).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Ashir, A. M. A generalized method for sentiment analysis across different sources. Appl. Comput. Intell. Soft Comput. 2021 , 2529984. https://doi.org/10.1155/2021/2529984 (2021).

Miranda, C. H., Guzmán, J., Miranda, C. H. & Guzmán, J. A review of sentiment analysis in spanish. Tecciencia 12 (22), 35–48. https://doi.org/10.18180/tecciencia.2017.22.5 (2017).

Can, E. F., Ezen-Can, A., & Can, F. Multilingual sentiment analysis: An RNN-based framework for limited data. Retrieved from arXiv preprint arXiv:1806.04511 (2018).

Chen, J., Becken, S., & Stantic, B. Lexicon-based Chinese language sentiment analysis method (2019). https://www.semanticscholar.org/paper/Lexicon-based-Chinese-language-sentiment-analysis-Chen-Becken/31730d51500a4c6b82a304a191c6cd8e4470e0a0 .

Poria, S. et al. Multimodal sentiment analysis: Addressing key issues and setting up the baselines. IEEE Intell. Syst. 33 (6), 17–25 (2018).

Zadeh, A., Chen, M., Poria, S., Cambria, E., & Morency, L.-P. Tensor fusion network for multimodal sentiment analysis. arXiv:1707.07250 (2017)..

Pérez-Rosas, V., Mihalcea, R., & Morency, L.-P. Utterance-level multimodal sentiment analysis. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 973–982 (2013)..

Li, W., Shao, W., Ji, S. & Cambria, E. BiERU: Bidirectional emotional recurrent unit for conversational sentiment analysis. Neurocomputing 467 , 73–82 (2022).

Chakravarthi, B. R. et al. Dravidiancodemix: Sentiment analysis and offensive language identification dataset for dravidian languages in code-mixed text. Lang. Resourc. Eval. 2022 , 1–42 (2022).

Kazmaier, J. & van Vuuren, J. H. The power of ensemble learning in sentiment analysis. Expert Syst. Appl. 187 , 115819 (2022).

D’aniello, G., Gaeta, M., & Rocca, I. L. KnowMIS-ABSA: An overview and a reference model for applications of sentiment analysis and aspect-based sentiment analysis (2022). https://www.semanticscholar.org/paper/KnowMIS-ABSA%3A-an-overview-and-a-reference-model-for-D%E2%80%99aniello-Gaeta/8cdeda9efbe2d1c12a406f8903ac698e8a1fef95 .

Valle-Cruz, D., Fernandez-Cortez, V., López-Chau, A. & Sandoval-Almazán, R. Does twitter affect stock market decisions? financial sentiment analysis during pandemics: A comparative study of the h1n1 and the covid-19 periods. Cogn. Comput. 14 (1), 372–387 (2022).

Wang, W., Guo, L. & Wu, Y. J. The merits of a sentiment analysis of antecedent comments for the prediction of online fundraising outcomes. Technol. Forecast. Soc. Change 174 , 121070 (2022).

Bueno, I., Carrasco, R. A., Ureña, R. & Herrera-Viedma, E. A business context-aware decision-making approach for selecting the most appropriate sentiment analysis technique in e-marketing situations. Inf. Sci. 589 , 300–320 (2022).

Aziz, S., Ullah, S., Mushtaq, M., Mughal, B. & Zahra, S. Roman Urdu sentiment analysis using machine learning with best parameters and comparative study of machine learning algorithms. Pak. J. Eng. Technol. https://doi.org/10.51846/vol3iss2pp172-177 (2020).

Mukhtar, N., Khan, M. A. & Chiragh, N. Lexicon-based approach outperforms supervised machine learning approach for Urdu sentiment analysis in multiple domains. Telem. Inf. 35 (8), 2173–2183 (2018).

Kanw, B. et al. Sentiment analysis of roman Urdu on e-commerce reviews using machine learning. CMES-Comput. Model. Eng. Sci. 131 (1), 393–413 (2022).

Khan, L., Amjad, A., Ashraf, N., Chang, H.-T. & Gelbukh, A. Urdu sentiment analysis with deep learning methods. IEEE Access 9 , 97803–97812. https://doi.org/10.1109/ACCESS.2021.3093078 (2021).

Qureshi, M. A. et al. Sentiment analysis of reviews in natural language: Roman Urdu as a case study. IEEE Access 10 , 24945–24954. https://doi.org/10.1109/ACCESS.2022.3150172 (2022).

Sehar, U. et al. Urdu sentiment analysis via multimodal data mining based on deep learning algorithms. IEEE Access 9 , 153072–153082. https://doi.org/10.1109/ACCESS.2021.3122025 (2021).

uroobasehar. “uroobasehar/datasethybriddependencybasedmodel” (2022, accessed 5 sep 2022). https://github.com/uroobasehar/datasethybriddependencybasedmodel .

UrduHack. “UrduHack” (2022, accessed 24 Apr 2022). https://urduhack.com/ .

Ghulam, H., Zeng, F., Li, W. & Xiao, Y. Deep learning-based sentiment analysis for roman urdu text. Procedia Comput. Sci. 147 , 131–135. https://doi.org/10.1016/j.procs.2019.01.202 (2019).

Summrina, K., Amir, H, & Kaizhu, H. Novel Artificial Immune Networks-based optimization of shallow machine learning (ML) classifiers. In Expert Systems with Applications 165 (2021, accessed 24 Apr 2022). https://jglobal.jst.go.jp/en/detail?JGLOBAL_ID=202102259372741659 .

Li, D. et al. Roman Urdu sentiment analysis using transfer learning. Appl. Sci. 12 (20), 10344 (2022).

Article   CAS   Google Scholar  

Khan, L. et al. Multi-class sentiment analysis of Urdu text using multilingual BERT. Sci. Rep. 12 , 5436. https://doi.org/10.1038/s41598-022-09381-9 (2022).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Rehman, I. & Soomro, T. R. Urdu sentiment analysis. Appl. Comput. Syst. 27 , 30–42. https://doi.org/10.2478/acss-2022-0004 (2022).

Chandio, B. A., Imran, A. S., Bakhtyar, M., Daudpota, S. M. & Baber, J. Attention-based RU-BiLSTM sentiment analysis model for roman Urdu. Appl. Sci. 12 , 3641. https://doi.org/10.3390/app12073641 (2022).

Khan, L., Amjad, A., Afaq, K. M. & Chang, H.-T. Deep sentiment analysis using CNN-LSTM architecture of english and roman Urdu text shared in social media. Appl. Sci. 12 , 2694. https://doi.org/10.3390/app12052694 (2022).

Ahmed, K. et al. Contextually enriched meta-learning ensemble model for Urdu sentiment analysis. Symmetry 15 (3), 645 (2023).

Article   ADS   Google Scholar  

Altaf, A. et al. Exploiting linguistic features for effective sentence-level sentiment analysis in Urdu language. Multimed. Tools Appl. 82 , 41813–41839. https://doi.org/10.1007/s11042-023-15216-0 (2023).

Bashir, M. F. et al. Context-aware emotion detection from low-resource urdu language using deep neural network. ACM Trans. Asian Low-Resourc. Lang. Inf. Process. 22 (5), 1–30. https://doi.org/10.1145/3528576 (2023).

Khan, M. Y., Ahmed, T., Siddiqui, M. S. & Wasi, S. Cognitive relationship-based approach for urdu sarcasm and sentiment classification. IEEE Access 2023 , 1–1. https://doi.org/10.1109/ACCESS.2023.3325048 (2023).

Download references

Acknowledgements

Researcher Supporting Project number (RSPD2023R609), King Saud University, Riyadh, Saudi Arabia.

Open access funding provided by Royal Institute of Technology.

Author information

Authors and affiliations.

Capital University of Science & Technology, Islamabad, 44000, Pakistan

Urooba Sehar

Division of Theoretical Computer Science, KTH Royal Institute of Technology Stockholm, Stockholm, Sweden

Summrina Kanwal

Center of Applied Intelligence Systems Research, Halmstad University, 302 50, Halmstad, Sweden

Department of Information Systems, College of Computer and Information Sciences, King Saud University, P.O. Box 51178, Riyadh, Saudi Arabia

Nasser I. Allheeib

Department of Computing and Informatics, Saudi Electronic University, 11673, Riyadh, Saudi Arabia

Sultan Almari

Riphah International University, Islamabad, 45320, Pakistan

School of Computing, Edinburgh Napier University, Edinburgh, EH10 5DT, UK

Kia Dashtipur & Mandar Gogate

Research and Innovation Centers, Rabdan Academy, P.O. Box 114646, Abu Dhabi, United Arab Emirates

Osama A. Khashan

You can also search for this author in PubMed   Google Scholar

Contributions

U.S. and S.K. wrote the main manuscript, F.K. formatted the manuscript. K.D., M.G.N.I.A. developed the algorithm, generated the results. F.K., S.K., S.Al. and O.A.K. got the data prepared. Each of the authers reviewed , and proof read the document.

Corresponding author

Correspondence to Summrina Kanwal .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Sehar, U., Kanwal, S., Allheeib, N.I. et al. A hybrid dependency-based approach for Urdu sentiment analysis. Sci Rep 13 , 22075 (2023). https://doi.org/10.1038/s41598-023-48817-8

Download citation

Received : 23 January 2023

Accepted : 30 November 2023

Published : 12 December 2023

DOI : https://doi.org/10.1038/s41598-023-48817-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

research report in urdu

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

SentiUrdu-1M: A large-scale tweet dataset for Urdu text sentiment analysis using weakly supervised learning

Roles Data curation, Formal analysis, Writing – original draft

Affiliation Dept. of Computer Science, Sukkur IBA University, Sukkur, Pakistan

Roles Funding acquisition, Supervision, Writing – review & editing

* E-mail: [email protected]

Affiliation Dept of Computer Science (IDI), Norwegian University of Science and Technology (NTNU), Gjøvik, Norway

ORCID logo

Roles Conceptualization, Methodology, Project administration, Supervision, Writing – original draft

Roles Formal analysis, Supervision, Writing – review & editing

Affiliation Department of Informatics, Linnaeus University, Växjö, Sweden

Roles Data curation, Formal analysis, Methodology, Writing – original draft

  • Abdul Ghafoor, 
  • Ali Shariq Imran, 
  • Sher Muhammad Daudpota, 
  • Zenun Kastrati, 
  • Sarang Shaikh, 
  • Rakhi Batra

PLOS

  • Published: August 30, 2023
  • https://doi.org/10.1371/journal.pone.0290779
  • Reader Comments

Table 1

Low-resource languages are gaining much-needed attention with the advent of deep learning models and pre-trained word embedding. Though spoken by more than 230 million people worldwide, Urdu is one such low-resource language that has recently gained popularity online and is attracting a lot of attention and support from the research community. One challenge faced by such resource-constrained languages is the scarcity of publicly available large-scale datasets for conducting any meaningful study. In this paper, we address this challenge by collecting the first-ever large-scale Urdu Tweet Dataset for sentiment analysis and emotion recognition. The dataset consists of a staggering number of 1, 140, 821 tweets in the Urdu language. Obviously, manual labeling of such a large number of tweets would have been tedious, error-prone, and humanly impossible; therefore, the paper also proposes a weakly supervised approach to label tweets automatically. Emoticons used within the tweets, in addition to SentiWordNet, are utilized to propose a weakly supervised labeling approach to categorize extracted tweets into positive, negative, and neutral categories. Baseline deep learning models are implemented to compute the accuracy of three labeling approaches, i.e., VADER, TextBlob, and our proposed weakly supervised approach. Unlike the weakly supervised labeling approach, the VADER and TextBlob put most tweets as neutral and show a high correlation between the two. This is largely attributed to the fact that these models do not consider emoticons for assigning polarity.

Citation: Ghafoor A, Imran AS, Daudpota SM, Kastrati Z, Shaikh S, Batra R (2023) SentiUrdu-1M: A large-scale tweet dataset for Urdu text sentiment analysis using weakly supervised learning. PLoS ONE 18(8): e0290779. https://doi.org/10.1371/journal.pone.0290779

Editor: Daniela Moctezuma, Centro de Investigacion en Ciencias de Informacion Geoespacial AC (Research Center on Geospatial Information Sciences), MEXICO

Received: November 1, 2022; Accepted: August 15, 2023; Published: August 30, 2023

Copyright: © 2023 Ghafoor et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data underlying the results presented in the study are available from https://data.mendeley.com/datasets/rz3xg97rm5/1 .

Funding: This work was supported in part by the Department of Computer Science (IDI), Faculty of Information Technology and Electrical Engineering, Norwegian University of Science and Technology (NTNU), Gjøvik, Norway; and in part by the Curricula Development and Capacity Building in Applied Computer Science for Pakistani Higher Education Institutions (CONNECT) Project NORPART-2021/10502, funded by DIKU. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

We are living in an era where our trust in technology is so strong that the majority of our decisions are influenced by it. Imagine you want to upgrade your phone to its latest released version; it is almost certain that your first step would be to read customers’ reviews about it through online product reviews. From the product owner’s perspective, processing these reviews manually is a nightmare. Therefore in the field of natural language processing, a whole new area of sentiment analysis is assisting in labeling reviews automatically in distinct categories, mostly negative, positive, and neutral. The literature reports two main approaches for processing text for sentiment analysis. The first is lexicon-based approach [ 1 – 8 ] which counts number of positive and negative words to label the text segment whereas the second approach is machine learning [ 9 – 13 ] which exploits different supervised and unsupervised algorithms to extract sentiment from the text.

The Sapir-Whorf Hypothesis [ 14 ] states that there are certain thoughts of an individual in one language that cannot be understood by those who live in another language. The hypothesis states that the way people think is strongly affected by their native languages. Therefore, expressing sentiments or opinions is much easier in the mother tongue than in any other language. Expressing feelings of hatred or love is difficult in a second language, whereas seamless in the mother tongue. For example, people from Pakistan would express their feeling or sentiments more freely and realistically while writing in Urdu, whereas Indians would love expressing emotions in Hindi. The field of sentiment analysis has made significant progress in processing English or other resource-rich language text. However, in resource-poor languages like Urdu, the performance of sentiment analysis is still in its infancy. Recently, few attempts have been made to extract sentiment from different languages including Thai [ 15 ], Korean [ 16 ], Arabic [ 17 ], Chinese [ 18 ], Portuguese [ 19 ] and Malay [ 20 ].

According to 2021 estimates of Ethnologue, Urdu is ranked as 10 th most widely spoken language of the world, having 230 million speakers ( https://www.ethnologue.com/guides/ethnologue200 ). It is the official language of Pakistan and few parts of India. It is also spoken in many parts of Bangladesh, Nepal, and the Middle East, and a significant diaspora of these countries in Europe, Canada, and the USA. Indeed, it is an understatement to suggest only 230 million speakers of Urdu; a similar number of the population speaks Urdu as their second language. More importantly, Hindi has more than 490 million speakers, and the most dominant language of highly populated India resembles quite significantly with Urdu. Although the two languages’ alphabet is different, from a spoken perspective, both are very similar. Mostly, those who can understand Hindi also understand Urdu and do the conversation without much difficulty.

Urdu is also being extensively used as an internet language with growing news platforms, including BBC-Urdu, Dawn, Express Tribune, and other giant media houses with dedicated Urdu news websites. Social media platforms have also seen a significant rise in the usage of the Urdu language as a communication medium [ 21 ].

Despite such a vast population and the importance of the Urdu language, from a machine learning perspective, the Urdu language is still considered a resource-poor language, for it does not have many big datasets, unlike English, Spanish, and other resource-rich languages. For example, there is no equivalent of Sentiment140 [ 22 ] in the Urdu language. Most of the recent attempts have, at most, resulted in only a few thousand instances in different datasets. Therefore, natural language processing tasks for the Urdu language, such as classification, summarization, seq2seq modeling, text generation, etc., are still in the infancy phase. The main reason behind the lack of big datasets in the Urdu language, especially from a sentiment analysis perspective, is the absence of exploiting automatic labeling techniques. Most of the datasets available in the Urdu language have been tagged through a manual labeling process, resulting in only a few thousands of instances. SentiUrdu-1M, proposed in this paper, is a large-scale Urdu tweets dataset labeled through innovative, weakly supervised automated techniques. The specific contributions of this work are listed below:

  • Collected a large-scale Urdu tweet dataset called SentiUrdu-1M for sentiment analysis and emotion recognition tasks.
  • Proposed a weakly supervised technique to label the tweets into positive, negative, and neutral polarity. The emoticons along with SentiWordNet, are used to train a model on a subset of the dataset for semi-supervised classification.
  • Established the baseline results on deep learning models on the newly collected large-scale tweet dataset.
  • Compared and evaluated the baseline model results on labeled data obtained via VADER and TextBlob to weakly supervised technique.

We strongly believe that SentiUrdu-1M would cause an advancement in processing Urdu language from an NLP perspective, and the tasks such as text summarization, classification, seq2seq modeling, and Urdu text generation would benefit from it.

The rest of the article is structured as follows. Section 2 presents the related work. A large-scale Urdu tweet dataset is explained in Section 3. Section 4 describes the data annotation and labeling techniques. Experimental settings are provided in Section 5 followed by results and their analysis presented in Section 6. Finally, the conclusion is drawn in Section 7.

2. Related work

Sentiment analysis is the study of people’s opinions, attitudes, and emotions toward individuals, businesses, and topics. For example, businesses want to find customers’ views about their products or services, and customers also read other people’s reviews about the product before buying. Sentiment analysis and emotion detection are often used in the same way but are quite different. Emotion is a complex psychological state, such as fear, anger, or happiness. The sentiment is a mental attitude produced by negative, positive, and neutral feelings. To extract sentiment from text, it is necessary to know subjectivity and emotion—two crucial concepts of sentiment analysis.

Subjectivity: subjective sentences comprise personal feelings or beliefs, e.g., opinions, allegations, suspicions, and desires. The subjective sentence may not contain the opinion. For example: “I want a phone with good voice quality” [ 23 ]. The sentence seems positive but it is not expressing any opinion.

Emotions: emotions are subjective feelings and thoughts. There are six primary emotions: joy, sadness, fear, anger, surprise, and disgust [ 24 ]. Emotions play vital roles in the existence or the complete make-up of individuals. (1) Joy is a pleasant emotional state defined by feelings of happiness, satisfaction, well-being, and gratification, such as smiling, a pleasant way of talking, and a relaxed body language stance. (2) Sadness is defined by feelings of disappointment, sorrow, uselessness, dull mood, crying, quietness, and feeling down are a few ways to express sadness. (3) fear is an emotional state often expressed as a result of perceived danger (4) Anger emotion can be defined by feelings of frustration or hostility towards others. It can be expressed by glaring, turning away, yelling, hitting, or throwing objects. (5) Surprise is another primary emotion that can be defined as a feeling of physiological startling response following something unexpected and expressed by screaming, jumping back, widening the eyes, and opening the mouth. (6) Disgust emotion often results from an unpleasant event that can be expressed by wrinkling the nose and curling the upper lip.

Sentiment analysis on English text is almost a decade and half old, popular works like IMDB dataset [ 25 ], Sentiment140 [ 22 ], Twitter US Airline Sentiment [ 26 ], Amazon Product Reviews [ 27 ] etc., has brought the performance of this field at an almost a human level accuracy. However, resource-poor languages still lack decent size (in excess of 100, 000 instances) dataset availability, thus suffering from low performance.

2.1 Sentiment analysis for the Urdu language

Recently, many studies have been performed on Urdu text for sentiment analysis. There is a common issue in all these studies, the dataset is limited to a few thousand instances only, and modern deep learning algorithms, which have outperformed traditional machine learning algorithms, are data-hungry. The researchers have proposed several approaches to assign polarity to Urdu text. The majority of researchers have used a manual human-annotated approach for this task; however, few studies have also used multi-lingual and POS Tagging approaches. This section will discuss the recent studies conducted on Urdu text sentiment analysis dataset development.

As discussed above, most research studies have used the manual labeling approach to create an Urdu dataset. Bilal et al. [ 28 ] developed an Urdu dataset consisting of 300 samples for Roman-Urdu opinions, and 150 samples for each negative and positive class. Urdu opinions were extracted from blogs and labeled by human annotators. Text classification was performed using three machine learning algorithms: Naïve Bayes, Decision Tree, and KNN. Experimental results revealed that Naïve Bayes performed better than other algorithms. In [ 29 ], the authors have proposed the Urdu-based lexicon for sentiment analysis. The authors scraped more than 26000 Urdu tweets from three Twitter accounts: (1) jang_akhbar, (2) BBCUrdu__, and (3) Dawn_News. They manually created an Urdu lexicon word list of 20,171 unique words from tweets and a POS tag was assigned to each word. The study only considered nouns, adjectives, and adverbs. The final list was reduced to 12,808 words. Afterward, human experts in the Urdu language were asked to label the unique nouns, adjectives, and adverbs as negative, positive, and neutral.

Mukhtar et al. [ 30 ] have compared the lexicon-based approach with the machine learning approach for Urdu sentiment analysis. The study reveals that the lexicon-based approach outperformed machine learning. The authors collected 6,025 Urdu sentences from 151 blogs to perform their experiments. Two human experts in the Urdu language were hired to annotate the sentences as negative, positive, and neutral. When there was disagreement between two annotators, a third expert was also hired to resolve the difference. For verification, the inter-annotator agreement is calculated by using Kappa statistic [ 31 ]. To create the Urdu lexicon, the positive and negative words were collected from three sources. ( https://chaoticity.com/urdusentimentlexicon/ ), ( https://sites.google.com/site/datascienceslab/projects/multilingualsentiment ), ( http://urdulughat.info/ ). A total number of 11,739 negative and 9,578 positive words were selected. Lexicon-based method raised the accuracy of the machine learning from 73.88% to 89.03%.

Mehmood et al. [ 32 ] proposed a discriminative feature spamming method for Roman Urdu sentiment analysis. The study collected 11,000 Roman Urdu reviews from many blogs and social media sites. They used a multi-annotator approach to label the dataset. The proposed approach improved the performance of standard machine learning algorithms. The research study [ 33 ] collected Roman Urdu comments from websites and annotated them manually as negative or positive. The final annotated dataset consisted of 400 positive and 406 negative comments and used three machine learning algorithms for classification, namely, Naive Bayes, Logistic Regression with Stochastic Gradient Descent, and Support Vector Machine. The experiment concluded that SVM, with an accuracy of 87.22%, performed better than other classifiers. The authors in paper [ 34 ] have proposed the Markov Chains approach for Urdu sentiment analysis. The proposed method consists of manual and probabilistic steps to label the dataset. Initially, 1,400 Urdu tweets were manually annotated by human experts, further to label the remaining 1,703 tweets. The Markov chains method is used to train the model on labeled data and predict the scores for 1,703 unlabeled samples. If the prediction score was more than 80%, then the tweet assigned predicted polarity else labeled manually. The final dataset comprised 328 positive, 1,604 negative, and 1,171 neutral tweets. Furthermore, the proposed method, machine learning, and lexicon-based approaches were evaluated on test data. Experimental results revealed that the proposed approach outperformed the other approaches.

Few recent studies [ 35 – 37 ] have also used a multi-lingual approach to develop datasets for Urdu sentiment analysis. Mukund et al. [ 37 ] have proposed the structural correspondence learning method for Urdu sentiment analysis. The study used the IIIT POS Hindi dataset, which was already in Latin script format. The Hindi dataset has many pure Sanskrit words which need to be replaced by Urdu, this replacement is done using online dictionaries ( http://www.urduword.com/ ), ( https://hamariweb.com/ ), and manual lookup. libSVM algorithm was used for text classification, and the algorithm produced an F-measure of 64.3%. Asghar et al., in their paper [ 35 ], have used the multi-lingual approach to develop a lexicon-based dataset for Urdu sentiment analysis. They extracted the adjective from the Urdu text using Urdu POS Tagging, then translated Urdu adjectives into English using a multi-lingual Urdu-to-English dictionary. The SentiWordNet lexicon was used to get a sentiment score for translated English adjectives [ 38 ].

Syed et al. in paper [ 39 ] have proposed the lexicon-based approach for Urdu Sentiment Analysis. 435 movies and 318 product reviews were collected from different websites. Urdu sentiment lexicon is used to assign the sentiment polarity to reviews. Experimental results show that the model produces 72% accuracy on movie reviews and 78% on product reviews. In the research paper [ 40 ], the authors have proposed the Roman Urdu Opinion mining system. Mobile phone reviews were collected from Whatmobile ( https://www.whatmobile.com.pk/ ), and Bing translator was used to translate these reviews into English. SharpNLP is used for POS tagging, and adjectives were selected as opinion words. The adjective lexicon dictionary was developed manually to assign the sentiment polarity to reviews. A summary of the related work is depicted in Table 1 .

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0290779.t001

Many researchers have also worked on Roman Urdu sentiment analysis. Roman Urdu uses English language characters, while the original Urdu writing uses Urdu language characters. Hussain et al. [ 41 ] used the LSTM model to perform Roman Urdu sentiment analysis. The authors compared the proposed model with Naive Bayes, Random Forest, and Support Vector Machine. Their proposed deep learning model outperformed the machine learning models. In the research paper [ 32 ], the authors collected 11000 Roman Urdu reviews and labeled the reviews manually, and proposed a novel term weighting technique, called discriminative feature spamming technique (DFST) for sentiment analysis. Lal et al. [ 42 ] collected 9601 Roman Urdu reviews from the web and assign them sentiment polarity as negative and positive. For sentiment analysis, the authors proposed deep learning and machine learning models.

3 Urdu tweet dataset

SentiUrdu-1M, proposed in this paper, is the first of its kind, a large-scale dataset of tweets in the Urdu language. It facilitates researchers in the field of NLP to perform tweet analysis and to evaluate the existing models and techniques for their accuracy in processing Urdu language text.

research report in urdu

This query fetches the tweets that are posted in the Urdu language between specified dates and do not contain links. The extracted raw data contains 72 columns, which describe tweets, users who have posted them, retweet information, and timestamps. We have retained three columns from these in our dataset that are suitable for our purpose. These are tweet id, tweet text, and tweet create date.

In order to make this dataset suitable for machine learning models we have performed pre-processing to remove unnecessary punctuation, spaces, characters, symbols, and mentioned hashtags and users from the tweet text. This large-scale Urdu dataset can be used to improve the sentiment analysis models for low-resource languages, therefore, we have extracted emojis from the tweet text because emojis represent human natural expression very neatly [ 43 ], so we can assume the sentiment of a user from the emojis he/she posted in a tweet. Emojis are extracted from tweet text by using a Python script that searches for the emoji in text from a list of 751 most frequently used emojis identified by [ 44 ]. These 751 emojis are further classified into different categories based on their representation. The categories are joy, sadness, fear, surprise, disgust, and anger.

In this dataset, it is observed that users have used many different emojis but the emoji “Face with tears of joy” have been used very frequently around 264, 976 times. Fig 1 presents the top 10 most frequently used emojis.

thumbnail

https://doi.org/10.1371/journal.pone.0290779.g001

In the final dataset, each tweet record contains tweet id, tweet text, emoji in tweet text, sentiment score of emoji, and category of emoji. Tweet id uniquely identifies each tweet record. Tweet text is the post/content posted by the user, mainly tweet length ranges from 3 to 280 characters. The distribution of the dataset according to tweet length is presented in Fig 2 . A dataset snippet is shown in Fig 3 .

thumbnail

https://doi.org/10.1371/journal.pone.0290779.g002

thumbnail

https://doi.org/10.1371/journal.pone.0290779.g003

3.1 SentiUrdu1M exploratory data analysis

This study has explored the Urdu text to present essential insights from data. We started to find the most frequent tokens used in the Urdu tweets dataset. Next, we manually read those tokens to extract the 10,000 most frequent Urdu tokens. Some frequent words are depicted in Fig 4 , while the complete list can be viewed on the Google drive link ( https://drive.google.com/file/d/1FIGdH4ypRSkdrhPOmXBfV-6kf4Czt4yw/view ) listing the top 10,000 most frequent tokens publicly for researchers working Urdu language. The shared Google sheet also contains the POS tags and Lemmatization of the top 10,000 Urdu tokens. Further manual analysis was done on the dataset to find whether Urdu contains word inflexions or not. We found many examples of word inflexions by comparing the Urdu tokens and Lemmatization of these tokens, as shown in Fig 5 . For example, before Lemmatization, the Urdu word on SNO 1 in Fig 5 means things, but when lemmatizer was applied to the token, words changed to things, and the word “children” changed to a “child”. When lemmatizer is applied to the word “foods,” then the word turns into “eat” which suggests that Lemmatization in Urdu also changes the POS tag of the words (food: noun to eat: verb). This analysis suggests that using original tokens instead of Lemmatization for the Urdu language will be better.

thumbnail

https://doi.org/10.1371/journal.pone.0290779.g004

thumbnail

https://doi.org/10.1371/journal.pone.0290779.g005

4. Methodology

4.1 dataset annotation.

Supervised learning algorithms require annotated dataset and the SentiUrdu-1M tweets’ dataset was not initially labeled into sentiment classes: positive, negative, and neutral. It is important to annotate the Dataset into sentiment polarity before it can be used to perform Urdu sentiment classification. There are two main approaches to text data labeling.

  • Manual Labeling: This approach requires human experts in the corresponding language to label the text data.
  • Automatic Labeling: Programming script is written to automatically label data to avoid manual work.

In order to avoid tedious and error-prone manual labeling, this paper proposes four different auto-labeling approaches, explained below.

4.2 Dataset labeling approach 01: Weakly supervised

The process of auto-labeling tweets’ sentiment polarity through a weakly supervised approach is shown in Fig 6 . It considers two inputs for deciding about sentiment polarity. The first input comes from SentiWordnet [ 38 ] which is a huge corpus of English language for words’ sentiment polarity score. We start this process by translating Urdu tweets to English using Google Translation API and extracting Adjectives, Adverbs, Verbs, and Nouns from the English text. These words are then queried to SentiWordNet for sentiment scores. Based on the cumulative polarity score, we assign sentiment polarity P 1 either positive, negative, or neutral label.

thumbnail

https://doi.org/10.1371/journal.pone.0290779.g006

Similarly, the second input P 2 is based upon emoticon available in Urdu text. We extract emoticons from the text and query the polarity of emoticons from the emoticon sentiment score dataset [ 44 ]. The polarity P 2 is either positive, negative, or neutral based upon the emoticon score from the emoticon dataset.

In case both P 1 and P 2 are the same for the input tweet, we retain the tweet in our partially labeled dataset. The reason being we are significantly confident about the polarity of sentiment as two different approaches are voting for the same polarity. In case P 1 and P 2 end up differently, we discard the input tweet at this stage to consider it in the second round of the tagging process. There were 414, 307 tweets out of 1, 140, 823, where sentiment polarities P 1 and P 2 were found the same.

In the second round, we used these 414, 307 tweets and split them into train and test sets, 80% and 20% respectively. Many sentiment classification experiments were performed on these tweets with an 80–20 ratio using deep learning models including LSTM, BiLSTM, and Conv1D. All deep learning algorithms produced almost the same results as shown in Table 2 . We train a BiLSTM model on these 414, 307 tweets and use the remaining tweets as a test set on the model for labeling the whole dataset, therefore it was used to predict the polarity of the remaining tweets. Table 3 shows the distribution of the dataset among three labels. The table illustrated in Fig 3 shows a sample of tweets where it can be observed that the emoticons used and the text of the tweet are conveying the same sentiment and emotion polarity.

thumbnail

https://doi.org/10.1371/journal.pone.0290779.t002

thumbnail

https://doi.org/10.1371/journal.pone.0290779.t003

4.3 Dataset labeling approach 02: VADER (Valence Aware Dictionary for Sentiment Reasoning)

VADER [ 48 ] is a pre-built lexicon as well as a rule-based framework for performing sentiment analysis. It is usually used to assign initial sentiment labeling to the text. A sentiment lexicon is a dictionary where the words are annotated with sentiment scores between -1 and 1. VADER is also able to aggregate sentiment scores of a complete sentence by taking individual scores of words. We used this framework directly from the NLTK package of the Python Programming Language to assign sentiment labels (positive, negative, and neutral) to our Urdu tweets dataset. Fig 7 shows the annotation process of our tweets using this approach. Table 3 shows the individual count of all three sentiment labels assigned using this approach for the tweets.

thumbnail

https://doi.org/10.1371/journal.pone.0290779.g007

4.4 Dataset labeling approach 03: TextBlob

Textblob [ 49 ] is a built-in Python library for processing text data. It provides very simple API interfaces to perform various NLP tasks such as part-of-speech tagging, text classification, noun-phrase extraction, and sentiment analysis. For performing sentiment analysis, it uses a sentiment lexicon pattern. to assign sentiment labels (positive, negative, and neutral) to the text. We used this library to assign these three sentiment labels to our Urdu tweets dataset. Fig 8 shows the annotation process of our tweets using this approach. Table 3 shows the individual count of all three sentiment labels assigned using this approach for the tweets.

thumbnail

https://doi.org/10.1371/journal.pone.0290779.g008

4.5 Dataset labeling approach 04: BERT

This study uses a BERT-based multilingual uncased model( https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment ) to incorporate an approach from transformer-based models. This version of BERT is fine-tuned for the sentiment analysis on the product reviews dataset. The dataset contains reviews in six languages, namely: English, French, German, Dutch, Italian, and Spanish. Since this model is not trained in Urdu, we first translate it into English using Google translator ( https://translate.google.com/ ) and then provide it as input to the BERT model to predict the sentiment of the text. The results are depicted in Table 3 .

5 Experimental settings

This section shows different experimental settings we used including deep learning model configurations, training, and test datasets with the relevant evaluation metrics.

5.1 Baseline models parameters

Table 4 shows the configuration parameters for all the baseline models we used for the baseline experiments.

thumbnail

https://doi.org/10.1371/journal.pone.0290779.t004

5.2 Evaluation metrics

The evaluation metrics are used to evaluate the performance of the system. The most commonly used evaluation metrics for text classification are Precision, Recall, F1-Score, Accuracy, and Kappa Scores. The mathematical representation of all these metrics is given in the Eqs 2 , 3 , 4 , 5 and 6 respectively. All of these metrics are calculated by true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN). These numbers in combination make the confusion matrix as shown in Fig 9 .

thumbnail

https://doi.org/10.1371/journal.pone.0290779.g009

research report in urdu

5.3 Dataset split

Table 2 shows the class-wise as well as overall statistics of the dataset labeled by three different polarity assessment approaches. But in order to carry out the baseline experiments we divided the datasets into three sets. 1) Training Set: 70% of the original dataset, 2) Validation Set: 15% of the original dataset, and 3) Test Set: 15% of the original dataset. The training and validation sets are used by the baseline models to perform training and then we evaluated the performance of those trained models on the test set. Tables 5 and 6 show class-wise as well as total statistics of all of these three sets of datasets labeled by polarity assessment approaches.

thumbnail

https://doi.org/10.1371/journal.pone.0290779.t005

thumbnail

https://doi.org/10.1371/journal.pone.0290779.t006

6 Results & discussion

This section presents the results of the deep learning models on three polarity assessment approaches discussed in section 3. We also used different conventional machine learning algorithms but the overall results were very poor so we do not report those results. Moreover, the baseline models with domain embeddings performed better than models with general-purpose embeddings i.e. FastText. One possible reason for this could be that FastText is trained on the Wikipedia text. Wikipedia contains formal text mostly written by professionals and Twitter data is a mixture of formal and informal language.

6.1 Weakly supervised dataset results

This section shows the results of all the baseline experiments performed on a weakly supervised dataset. The given results are computed against the test set of the dataset. Table 11 shows the precision and recall results and Table 7 depicts the F1-score, accuracy, and kappa score values of individual class labels as well as the whole test set.

thumbnail

https://doi.org/10.1371/journal.pone.0290779.t007

As we can see from Table 3 , the dataset is highly imbalanced because a majority of the instances lie in the positive class label so it would be interesting to see either we got satisfactory precision results for low-instances class labels as well. The neutral class has the lowest instances as compared to the positive and negative classes. If we look into the individual or average precision results in Table 11, the top performing baseline models are LSTM, BiLSTM, and Conv1D with 90%+ values. The DNN and RNN models have performed low as compared to other models. The possible reason for the low performance of DNN could be that it is built for processing a single unit of information at a time and is inefficient for processing sequential information such as text sequences. Moreover, the lowest performance of RNN is due to its inability to process and understand long text sequences. The top-performing models have the ability to overcome all of the above-mentioned problems while processing text sequences.

Furthermore, if we observe the individuals as well as average recall values in Table 11, the same models (LSTM, BiLSTM, and Conv1D) are top-performing models. However, the DNN and RNN have low recall values as compared to other models. This gives more confidence regarding the performance of top models (LSTM, BiLSTM, and Conv1D).

Next, we calculated the F1-score, accuracy, and kappa scores of the baseline experiments for the test set of the Weakly supervised dataset. The results are given in Table 7 . Here, we can also observe that the top performing models are again the same (LSTM, BiLSTM, and Conv1D) in F1-score as well as accuracy values with 95%+ values. The main values which need to be discussed here are the kappa score values which show the agreement level between the original and predicted class labels. We can see that the overall score values are above 80% which indicates that the predicated class labels are very much near to the original class labels of the dataset. Hence, it gives more confidence in the validity of the baseline experiment results.

6.2 VADER dataset results

This section explains the results of all the baseline experiments performed on the VADER dataset. Table 12 depicts the precision as well as recall results. However, Table 8 shows the F1-score, accuracy, and kappa score results of the dataset. The difference in this dataset as compared to the previous one is that the dataset has the majority of the instances in the neutral class. So, it will be exciting to analyze the overall results as well as the results of classes having fewer instances, i.e. positive and negative. The results are computed on the test set of the dataset. The precision results for the neutral class are 100% due to having the majority of the instances. Although the positive and negative classes are with fewer instances, the top-performing models (LSTM, BiLSTM, and Conv1D) have given 90%+ and 80%+ precision results for positive and negative classes, respectively. Moreover, the overall precision and recall results are above 90% and 80%, respectively for the (LSTM, BiLSTM, and Conv1D) models. Again, here the low-performance models are DNN and RNN.

thumbnail

https://doi.org/10.1371/journal.pone.0290779.t008

Furthermore, the results from the Table 8 give strong evidence for top-performing models (LSTM, BiLSTM, Conv1D) because the F1-score and accuracy values are again taking lead for these models as compared to DNN and RNN models. The kappa score values are interesting to see as again the relation of original to predicted class labels is very strong. The shifting of the majority of the instances from positive to neutral class in this dataset has not affected the overall model’s performance and results.

6.3 TextBlob dataset results

This section shows the results of all the baseline experiments performed on the TextBlob dataset. Table 13 shows the precision, and recall results, and Table 9 shows the F1-score, accuracy, and kappa score results of the dataset. Again, this dataset has the majority of the instances in the neutral class. The results are computed on the test set of the dataset. As discussed in the previous section, it is always best to see the performance of the models for classes having low instances as compared to the majority instances classes. Continuing the pattern of previous baseline experiments, again the (LSTM, BiLSTM, and Conv1D) models are taking the lead in individual classes as well as whole test set precision and recall results by giving 90%+ and 80%+ values, respectively. The low-performing models are DNN and RNN. Table 9 also depicts the same top-performing and low-performing models using F1-scores, accuracy, and kappa score values. Here, the dataset has again the majority of the instances in neutral class and it did not affect the overall model’s performance.

thumbnail

https://doi.org/10.1371/journal.pone.0290779.t009

6.4 BERT dataset results

Compared to other approaches, BERT results are poor, as presented in Table 10 . The Kappa score results suggest that the worst agreement for annotations and the F-1 score, and the accuracy are significantly less than our proposed weakly supervised method. The possible reason is that the Bert-base multilingual uncased sentiment model is fine-tuned on a wide range of product reviews written in six different languages: English, Dutch, German, French, Spanish, and Italian ( https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment ). In our study, we aimed to tackle the challenge of sentiment analysis for content in the Urdu language. To address this, we initially translated Urdu tweets into English and then used the Bert-based model to predict the sentiment of the Urdu content.

thumbnail

https://doi.org/10.1371/journal.pone.0290779.t010

Our research findings show that simply fine-tuning a Bert-based model on languages with rich linguistic resources does not necessarily lead to improved performance on languages with fewer resources. This is the case even if we try to bridge the gap by translating resource-poor data into a language with richer resources. We previously explored this issue in detail in one of our earlier research papers [ 52 ]. In contrast to the BERT, the proposed weakly supervised method also includes emoticons for identifying the sentiment polarity of Urdu tweets, not taken into consideration by other models.

6.5 Comparison of classification results between labelling approaches

This section discusses the comparison of F1-score and kappa score values for best-performing models of all four polarity assessment approaches. The reason to choose F1-score and kappa score is that the F1-score values are the balanced values representing both precision and recall values. Also, the kappa score values are the relation of predicted class labels to original class labels. Fig 10 shows the comparison discussed above.

thumbnail

https://doi.org/10.1371/journal.pone.0290779.g010

From Fig 10 we can observe that the best models for all four polarity assessment approaches (weakly supervised, VADER, TextBlob, and BERT) with respect to F1-score and kappa score values are LSTM, BiLSTM, and Conv1D. Also, there is very little difference in F1-score and kappa score values in for weakly Supervised, VADER, and Texblob. This gives strong evidence that whatever polarity assessment approach we use to assign polarities, it will not affect the overall performance and learning of the deep learning models.

The VADER and TextBlob put most tweets as neutral. This is primarily attributed to the fact that these models do not consider emoticons for assigning polarity, which is the main disadvantage of VADER and TextBlob. Still, a high correlation was observed between the two approaches.

Texblob also uses Google Translate to translate low-resource languages such as Urdu into English and generate a polarity class for input text ( https://thinkinfi.com/natural-language-processing-using-textblob/ ), ( https://thinkinfi.com/natural-language-processing-using-textblob/ ). Our recent study conducted in [ 52 ] proved that Google Translate caused performance degradation for low-resource languages. Therefore, we did not report extended results in this study to prove it. For more detailed information on this topic, the readers are advised to refer to this research work [ 52 ]. TextBlob lexicon solely considers the text to assign polarity. It does not consider the emoticons, so effectively detecting sarcasm, negation, ambiguous words, phrase and idioms, and slang in low-resource languages is often challenging. These terms are very important and can cause polarity changes for the low-resource text. The same case is with VADER. VADER is an English rule-based lexicon that uses the machine translation tool “My Memory Translation Service” ( http://mymemory.translated.net ) ( http://mymemory.translated.net ) to generate the polarity for non-English text ( https://github.com/cjhutto/vaderSentiment ) ( https://github.com/cjhutto/vaderSentiment ).

The detailed results of the different labeling approaches can be found in Tables 11 – 14 . When compared to other methods, our proposed weakly supervised learning approach stands out for being more balanced and fair. This is evident from the precision and recall values for each class in Table 11 . Particularly, the BilSTM model shows the best performance among all models, managing to achieve a well-rounded performance across different classes and an overall average of 91%.

thumbnail

https://doi.org/10.1371/journal.pone.0290779.t011

thumbnail

https://doi.org/10.1371/journal.pone.0290779.t012

thumbnail

https://doi.org/10.1371/journal.pone.0290779.t013

thumbnail

https://doi.org/10.1371/journal.pone.0290779.t014

Taking a closer look at the outcomes for Vader and TextBlob in Tables 12 and 13 , respectively, it’s clear that both approaches tend to favor the Neutral class. This trend is supported by the findings in Table 5 , where most Urdu tweets are labeled as Neutral by both Vader and TextBlob. This bias stems from the fact that Vader and TextBlob rely on English language patterns and use translation to handle non-English text. On the other hand, the results from the BERT-based model don’t show a strong bias towards any particular class. However, it’s important to note that the BERT-based model’s performance is noticeably weaker when compared to our proposed weakly supervised method, as highlighted in Table 14 .

Table 15 provides a thorough overview of the models that perform the best using various labeling techniques. Among these techniques, TextBlob stands out with better F1 scores and Accuracy. Weakly Supervised and Vader have similar F1 scores, but Vader has higher accuracy. On the other hand, BERT doesn’t perform as well as the other methods.

thumbnail

https://doi.org/10.1371/journal.pone.0290779.t015

It’s important to note that VADER and TextBlob show higher accuracy because they deal with a lot of instances that are categorized as Neutral. This large number of Neutral instances introduces some bias, which leads to inflated accuracy scores for both models. This becomes clearer when we look at Tables 12 and 13 , the results for the positive and negative classes aren’t as good when compared to the suggested weakly supervised approach.

6.6 Comparison between human labeled and automatic labeled tweets

The dataset proposed in this study contains more than 1 million tweets, so it is impossible to manually annotate this huge dataset. Therefore, in this study, we manually labeled 400 tweets to report human analysis of the dataset, 164 tweets annotated as positive, 158 as negative, and 78 tweets labeled as neutral. Further, we compared manually labeled tweets with automatically labeled methods discussed in this paper, Weakly Supervised, Vader, and TextBlob. This analysis discovered that labeling similarity between human-labeled and proposed weakly-supervised approaches is about 51.5% which is better than the 21.0% and 25.0% respectively for Vader and TextBlob.

7. Conclusion and future work

This article aimed to propose a new dataset—SentiUrdu-1M, a large-scale tweet dataset for Urdu language text sentiment analysis. The article also sets baseline results on the SentiUrdu-1M dataset for future researchers to pursue further. Urdu language, despite being spoken by more than 270 million people around the world, is still considered a resource-poor language from a machine learning perspective. Only a handful of datasets with a few thousand instances are available for Urdu text processing, which makes the Urdu language a poor candidate for processing using state-of-the-art deep learning algorithms. SentiUrdu-1M would prove to be a leapfrog in the advancement of the Urdu language and its progress in text processing, especially from a sentiment analysis perspective.

The article also proposed an automated instances-labeling approach using SentiWordNet and emoticons extracted from text. The proposed approach is generalizable and can be exploited to label tweets from other natural languages too as the language of emoticons is universal and corresponding emotions have the same meaning in all human languages. A smiling face is positive in Urdu as well as in Thai, Norwegian, or any other natural language.

SentiUrdu-1M can also be used to train models such as LSTM or GPT-2 for Urdu text generation. Recently, with the advent of attention-based transformer models, the dream of the AI community to generate synthetic text has come true but it is mostly limited to English and a few other resource-rich languages. SentiUrdu-1M has the potential to cause a similar disruption in the Urdu language. This might turn out to be a baby step towards an Urdu-talking robot.

SentiUrdu-1M can also be used to produce a generic pre-trained Urdu word embedding—on similar lines as GloVe-Twitter word embedding for English tweets. Such an embedding would serve a purpose in Urdu text classification, summarization, seq2seq modeling, and other natural language processing tasks.

Supporting information

S1 appendix..

https://doi.org/10.1371/journal.pone.0290779.s001

  • View Article
  • Google Scholar
  • 2. Mohammad S, Dunne C, Dorr B. Generating high-coverage semantic orientation lexicons from overtly marked words and a thesaurus. Proceedings of the 2009 conference on empirical methods in natural language processing 2009 Aug (pp. 599–608).
  • 3. Edalati M, Imran AS, Kastrati Z, Daudpota SM. The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. Intelligent Systems and Applications: Proceedings of the 2021 Intelligent Systems Conference (IntelliSys) Volume 3 2022 (pp. 11–22). Springer International Publishing.
  • 4. Andreevskaia A, Bergler S. Mining wordnet for a fuzzy sentiment: Sentiment tag extraction from wordnet glosses. 11th conference of the European chapter of the Association for Computational Linguistics 2006 Apr (pp. 209–216).
  • 5. Esuli A, Sebastiani F. Determining term subjectivity and term orientation for opinion mining. 11th Conference of the European chapter of the association for computational linguistics 2006 Apr (pp. 193–200).
  • 6. Esuli A, Sebastiani F. Determining the semantic orientation of terms through gloss classification. Proceedings of the 14th ACM international conference on Information and knowledge management 2005 Oct 31 (pp. 617–624).
  • 7. Ding X, Liu B, Yu PS. A holistic lexicon-based approach to opinion mining. Proceedings of the 2008 international conference on web search and data mining 2008 Feb 11 (pp. 231–240).
  • 8. Sebastiani F, Esuli A. Sentiwordnet: A publicly available lexical resource for opinion mining. Proceedings of the 5th international conference on language resources and evaluation 2006 May 22 (pp. 417–422). European Language Resources Association (ELRA) Genoa, Italy.
  • 9. Dave K, Lawrence S, Pennock DM. Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. Proceedings of the 12th international conference on World Wide Web 2003 May 20 (pp. 519–528).
  • 10. Paltoglou G, Thelwall M. A study of information retrieval weighting schemes for sentiment analysis. Proceedings of the 48th annual meeting of the association for computational linguistics 2010 Jul (pp. 1386–1395).
  • 12. Pang B, Lee L, Vaithyanathan S. Thumbs up? Sentiment classification using machine learning techniques. arXiv preprint cs/0205070. 2002 May 28.
  • PubMed/NCBI
  • 15. Sanguansat P. Paragraph2vec-based sentiment analysis on social media for business in thailand. 8th International Conference on Knowledge and Smart Technology (KST) 2016 Feb 3 (pp. 175–178). IEEE.
  • 19. Cirqueira D, Pinheiro MF, Jacob A, Lobato F, Santana Á. A literature review in preprocessing for sentiment analysis for Brazilian Portuguese social media. 2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI) 2018 Dec 3 (pp. 746–749). IEEE.
  • 20. Chekima K, Alfred R. Sentiment analysis of Malay social media text. 4th ICCST 2017, Kuala Lumpur, Malaysia, 29–30 November, 2017 2018 (pp. 205–219). Springer Singapore.
  • 23. Liu B, Zhang L. A survey of opinion mining and sentiment analysis. Mining text data 2012 (pp. 415–463). Springer, Boston, MA.
  • 24. Parrott WG, editor. Emotions in social psychology: Essential readings. psychology press; 2001.
  • 25. Maas A, Daly RE, Pham PT, Huang D, Ng AY, Potts C. A Learning word vectors for sentiment analysis. Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies 2011 Jun (pp. 142–150).
  • 26. Rane A, Kumar A. Sentiment classification system of Twitter data for US airline service analysis. 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC) 2018 Jul 23 (Vol. 1, pp. 769–773). IEEE.
  • 27. Haque TU, Saber NN, Shah FM. Sentiment analysis on large scale Amazon product reviews. 2018 IEEE international conference on innovative research and development (ICIRD) 2018 May 11 (pp. 1–6). IEEE.
  • 29. Amjad K, Ishtiaq M, Firdous S, Mehmood MA. Exploring Twitter news biases using urdu-based sentiment lexicon. 2017 International Conference on Open Source Systems & Technologies (ICOSST) 2017 Dec 18 (pp. 48–53). IEEE.
  • 36. Amjad M, Sidorov G, Zhila A. Data augmentation using machine translation for fake news detection in the Urdu language. Proceedings of the Twelfth Language Resources and Evaluation Conference 2020 May (pp. 2537–2542).
  • 37. Mukund S, Srihari RK. Analyzing Urdu social media for sentiments using transfer learning with controlled translations. Proceedings of the second workshop on language in social media 2012 Jun (pp. 1–8).
  • 38. Baccianella S, Esuli A, Sebastiani F. Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. InLrec 2010 May 17 (Vol. 10, No. 2010, pp. 2200–2204).
  • 39. Syed AZ, Aslam M, Martinez-Enriquez AM. Lexicon based sentiment analysis of Urdu text using SentiUnits. 9th Mexican International Conference on Artificial Intelligence, MICAI 2010, Pachuca, Mexico, November 8-13, 2010, Proceedings, Part I 9 2010 (pp. 32–43). Springer Berlin Heidelberg.
  • 40. Daud M, Khan R, Daud A. Roman Urdu opinion mining system (RUOMiS). arXiv preprint arXiv:1501.01386. 2015 Jan 7.
  • 49. Laksono RA, Sungkono KR, Sarno R, Wahyuni CS. Sentiment analysis of restaurant customer reviews on TripAdvisor using Naïve Bayes. 12th international conference on information & communication technology and system (ICTS) 2019 Jul 18 (pp. 49–54). IEEE.

Help | Advanced Search

Computer Science > Computation and Language

Title: efficient urdu caption generation using attention based lstm.

Abstract: Recent advancements in deep learning have created many opportunities to solve real-world problems that remained unsolved for more than a decade. Automatic caption generation is a major research field, and the research community has done a lot of work on it in most common languages like English. Urdu is the national language of Pakistan and also much spoken and understood in the sub-continent region of Pakistan-India, and yet no work has been done for Urdu language caption generation. Our research aims to fill this gap by developing an attention-based deep learning model using techniques of sequence modeling specialized for the Urdu language. We have prepared a dataset in the Urdu language by translating a subset of the "Flickr8k" dataset containing 700 'man' images. We evaluate our proposed technique on this dataset and show that it can achieve a BLEU score of 0.83 in the Urdu language. We improve on the previous state-of-the-art by using better CNN architectures and optimization techniques. Furthermore, we provide a discussion on how the generated captions can be made correct grammar-wise.

Submission history

Access paper:.

  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

DBLP - CS Bibliography

Bibtex formatted citation.

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Customer Reviews

Adam Dobrinich

Do my essay with us and meet all your requirements..

We give maximum priority to customer satisfaction and thus, we are completely dedicated to catering to your requirements related to the essay. The given topic can be effectively unfolded by our experts but at the same time, you may have some exclusive things to be included in your writing too. Keeping that in mind, we take both your ideas and our data together to make a brilliant draft for you, which is sure to get you good grades.

5 Signs of a quality essay writer service

Read what our clients have to say about our writing essay services!

research report in urdu

The first step in making your write my essay request is filling out a 10-minute order form. Submit the instructions, desired sources, and deadline. If you want us to mimic your writing style, feel free to send us your works. In case you need assistance, reach out to our 24/7 support team.

Gustavo Almeida Correia

Research papers can be complex, so best to give our essay writing service a bit more time on this one. Luckily, a longer paper means you get a bigger discount!

Finished Papers

Cart

  • SUGGESTED TOPICS
  • The Magazine
  • Newsletters
  • Managing Yourself
  • Managing Teams
  • Work-life Balance
  • The Big Idea
  • Data & Visuals
  • Reading Lists
  • Case Selections
  • HBR Learning
  • Topic Feeds
  • Account Settings
  • Email Preferences

Research: Negotiating Is Unlikely to Jeopardize Your Job Offer

  • Einav Hart,
  • Julia Bear,
  • Zhiying (Bella) Ren

research report in urdu

A series of seven studies found that candidates have more power than they assume.

Job seekers worry about negotiating an offer for many reasons, including the worst-case scenario that the offer will be rescinded. Across a series of seven studies, researchers found that these fears are consistently exaggerated: Candidates think they are much more likely to jeopardize a deal than managers report they are. This fear can lead candidates to avoid negotiating altogether. The authors explore two reasons driving this fear and offer research-backed advice on how anxious candidates can approach job negotiations.

Imagine that you just received a job offer for a position you are excited about. Now what? You might consider negotiating for a higher salary, job flexibility, or other benefits , but you’re apprehensive. You can’t help thinking: What if I don’t get what I ask for? Or, in the worst-case scenario, what if the hiring manager decides to withdraw the offer?

research report in urdu

  • Einav Hart is an assistant professor of management at George Mason University’s Costello College of Business, and a visiting scholar at the Wharton School. Her research interests include conflict management, negotiations, and organizational behavior.
  • Julia Bear is a professor of organizational behavior at the College of Business at Stony Brook University (SUNY). Her research interests include the influence of gender on negotiation, as well as understanding gender gaps in organizations more broadly.
  • Zhiying (Bella) Ren is a doctoral student at the Wharton School of the University of Pennsylvania. Her research focuses on conversational dynamics in organizations and negotiations.

Partner Center

Finished Papers

research report in urdu

Looking for something more advanced and urgent? Then opt-in for an advanced essay writer who’ll bring in more depth to your research and be able to fulfill the task within a limited period of time. In college, there are always assignments that are a bit more complicated and time-taking, even when it’s a common essay. Also, in search for an above-average essay writing quality, more means better, whereas content brought by a native English speaker is always a smarter choice. So, if your budget affords, go for one of the top 30 writers on our platform. The writing quality and finesse won’t disappoint you!

Premium essay writers

Essay writing help from a premium expert is something everyone has to try! It won’t be cheap but money isn’t the reason why students in the U.S. seek the services of premium writers. The main reason is that the writing quality premium writers produce is figuratively out of this world. An admission essay, for example, from a premium writer will definitely get you into any college despite the toughness of the competition. Coursework, for example, written by premium essay writers will help you secure a positive course grade and foster your GPA.

Please, Write My Essay for Me!

Congratulations, now you are the wittiest student in your classroom, the one who knows the trick of successful and effortless studying. The magical spell sounds like this: "Write my essay for me!" To make that spell work, you just need to contact us and place your order.

If you are not sure that ordering an essay writing service is a good idea, then have no doubts - this is an absolutely natural desire of every aspiring student. Troubles with homework are something all learners have to experience. Do you think that the best high-achievers of your class pick the essays from some essay tree? - They have to struggle with tasks as well as you do. By the way, the chances are that they are already our customers - this is one of the most obvious reasons for them to look that happy.

Some students are also worried that hiring professional writers and editors is something like an academic crime. In reality, it is not. Just make sure that you use the received papers smartly and never write your name on them. Use them in the same manner that you use books, journals, and encyclopedias for your papers. They can serve as samples, sources of ideas, and guidelines.

So, you have a writing assignment and a request, "Please, write my essay for me." We have a team of authors and editors with profound skills and knowledge in all fields of study, who know how to conduct research, collect data, analyze information, and express it in a clear way. Let's do it!

Is my essay writer skilled enough for my draft?

research report in urdu

U.S. flag

An official website of the United States government

Here’s how you know

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

  • Science and Technology Directorate
  • S&T Releases Market Survey Report for Non-Detonable Training Aids for Explosive Detection Canines

News Release: DHS S&T Releases Market Survey Report for Non-Detonable Training Aids for Explosive Detection Canines

For immediate release s&t public affairs , 202-286-9047.

WASHINGTON - The Department of Homeland Security (DHS) Science and Technology Directorate (S&T) has released a new market survey report to help emergency responders identify non-detonable training aids for explosive detection canines. Non-detonable training aids emulate the scent of explosives, allowing canines to learn the specific odor of different types of explosives while eliminating the inherent risks of using traditional, live explosives. They are carefully designed and maintained to create a controlled and safe environment for training, with a focus on safety, effectiveness, and consistency in preparing canines for their crucial roles in security and public safety.

A canine near the rear of a car. Non-Detonable Training Aids for Explosives Detection Canines Market Survey Report. February 2024. S&T and NUSTL logos.

S&T’s National Urban Security Laboratory (NUSTL)—in conjunction with Johns Hopkins Applied Physics Laboratory—administered the Non-Detonable Trainings Aids for Explosive Detection Canines Market Survey Report , which provides information on 12 non-detonable training aid products ranging in price from $15 to $550. This report is based on information gathered from manufacturer and vendor materials, open-source research, industry publications, and a government-issued request for information. The report is part of NUSTL’s System Assessment and Validation for Emergency Responders (SAVER) program, which assists emergency responders in making procurement decisions.

“The Detection Canine Program at S&T plays a critical role in advancing the safety and effectiveness of explosive detection canines in the field,” said Guy Hartsough, S&T Detection Canine Program, program manager. “NUSTL’s comprehensive report provides valuable resources in an ever-evolving landscape of threats, underscoring our dedication to enhancing the capabilities of our nation's security responders.”

“Explosive detection canines are critical to protecting the public as well as the first responders they assist. Rigorous training is required to prepare these dogs for the field” said NUSTL Director Alice Hong. “NUSTL’s latest market survey report equips explosive detection canine handlers with crucial insights into non-detonable training aids.”

Visit the SAVER website for market research and comparative assessments of commercially available products. Results are published to assist responders in making informed technology deployment and purchasing decisions for their agency’s specific needs. SAVER documents with limited distribution are available to members of the SAVER Community by contacting [email protected] .

For more information about NUSTL and its mission as the only national laboratory dedicated to serving the nation’s first responders, visit www.dhs.gov/science-and-technology/national-urban-security-technology-laboratory

  • Science and Technology
  • Detection Canine
  • Explosive Detection Canine

Numbers, Facts and Trends Shaping Your World

Read our research on:

Full Topic List

Regions & Countries

  • Publications
  • Our Methods
  • Short Reads
  • Tools & Resources

Read Our Research On:

  • Americans’ Changing Relationship With Local News
  • 1. Attention to local news

Table of Contents

  • 2. Local news topics
  • Americans’ changing local news providers
  • How people feel about their local news media’s performance
  • Most Americans think local journalists are in touch with their communities
  • Interactions with local journalists
  • 5. Americans’ views on the financial health of local news
  • Acknowledgments
  • The American Trends Panel survey methodology

The share of Americans who say they follow local news very closely now stands at 22% – a decline of 15 percentage points since 2016, when 37% of U.S. adults said the same.

A bar chart showing fewer Americans are closely following local and national news

Most U.S. adults (66%) still say they follow local news at least somewhat closely , although this number is also down. Roughly eight-in-ten adults (78%) followed local news at least somewhat closely in 2016.

This decline in attention is not unique to local news: The percentage of Americans following national news very closely declined from 33% in 2016 to 22% in 2024. And the share who say they follow the news all or most of the time (whether it is local, national or some other kind of news) dropped from 51% in 2016 to 38% in 2022.

A line chart showing older adults are more likely to follow local news very closely, although attention is waning across all groups

The decline in attention to local news has occurred across demographic groups, though there are still major differences by age. Young adults are much less likely than their elders to say they follow local news: In 2024, just 9% of Americans ages 18 to 29 say they follow local news very closely, compared with 35% of those 65 and older.

But people across all age groups have become less likely to follow local news in recent years. For instance, in 2016, 23% of the youngest adults said they followed local news very closely, and 51% of the oldest adults said the same.

About half of the youngest adults (47%) now say they follow local news at least somewhat closely, while majorities of all other age groups say this.

A table showing across demographic groups, Americans are following local news less

Americans with higher levels of formal education are less likely than those with a high school diploma or less education to follow local news very closely. While 17% of college graduates say they follow local news very closely, 28% of those with a high school education or less say the same.

And while Americans at all levels of education have become less likely to follow local news, this gap has narrowed in recent years. In 2016, there were 23 percentage points between the highest and lowest education categories (24% vs. 47%), compared with an 11-point difference today.

Black Americans are more likely than people in other racial and ethnic groups to follow local news very closely. But there is virtually no difference on this question between Democrats and Republicans (including those who lean toward each party).

Sign up for our weekly newsletter

Fresh data delivery Saturday mornings

Sign up for The Briefing

Weekly updates on the world of news & information

  • Digital News Landscape
  • Journalists
  • Trust in Media

Introducing the Pew-Knight Initiative

8 facts about black americans and the news, audiences are declining for traditional news media in the u.s. – with some exceptions, how black americans engage with local news, local tv news fact sheet, most popular, report materials.

1615 L St. NW, Suite 800 Washington, DC 20036 USA (+1) 202-419-4300 | Main (+1) 202-857-8562 | Fax (+1) 202-419-4372 |  Media Inquiries

Research Topics

  • Age & Generations
  • Coronavirus (COVID-19)
  • Economy & Work
  • Family & Relationships
  • Gender & LGBTQ
  • Immigration & Migration
  • International Affairs
  • Internet & Technology
  • Methodological Research
  • News Habits & Media
  • Non-U.S. Governments
  • Other Topics
  • Politics & Policy
  • Race & Ethnicity
  • Email Newsletters

ABOUT PEW RESEARCH CENTER  Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions. It is a subsidiary of  The Pew Charitable Trusts .

Copyright 2024 Pew Research Center

Terms & Conditions

Privacy Policy

Cookie Settings

Reprints, Permissions & Use Policy

CNBC/NRF Retail Monitor, powered by Affinity Solutions April 2024 Report

The CNBC/NRF Retail Monitor provides a first look at how retail sales perform each month. The Retail Monitor leverages Affinity Solutions’ data from more than 140 million credit and debit cards, with nearly nine billion transactions totaling more than $500 billion in annual spending, to measure the monthly and annual change in U.S. retail sales.

Learn more about the Retail Monitor by visiting our FAQs page.

Related content

NRF Center for Retail Insights

A writer who is an expert in the respective field of study will be assigned

10 question spreadsheets are priced at just .39! Along with your finished paper, our essay writers provide detailed calculations or reasoning behind the answers so that you can attempt the task yourself in the future.

Finished Papers

PenMyPaper

Pricing depends on the type of task you wish to be completed, the number of pages, and the due date. The longer the due date you put in, the bigger discount you get!

Finished Papers

For expository writing, our writers investigate a given idea, evaluate its various evidence, set forth interesting arguments by expounding on the idea, and that too concisely and clearly. Our online essay writing service has the eligibility to write marvelous expository essays for you.

Perfect Essay

research report in urdu

Stem cell injections in Mexico can be hazardous. Report identifies US victims.

research report in urdu

Health experts are alerting travelers considering medical care abroad about a trio of recent drug-resistant bacterial infections caused by stem cell injections at Mexican clinics.

After stem cell treatments abroad, three Americans became infected by mycobacterium abscessus, a distant relative of the bacteria that cause tuberculosis and leprosy. In a report published Thursday , U.S. medical experts said they fear additional infections from the injections could have been missed. Two patients shared bacteria with identical genetic material even though their procedures happened in clinics hundreds of miles apart. The incidents have raised concerns about others who sought stem cell injection treatments abroad. The procedure is not approved by the U.S. Food and Drug Administration.

“It's hard to put an exact number, because unfortunately, nobody seems to be monitoring this very closely,” said Dr. Charles Daley, a pulmonologist at National Jewish Health, a hospital in Denver. “There's very little oversight.”

Medical tourism: Why are more people traveling abroad for cosmetic surgery, and what are the risks?

Medical tourism, when Americans travel abroad for treatments, has been on the rise in recent years. As many as 320,000 U.S. citizens travel internationally for medical care each year, according to the State Department . The Centers for Disease Control and Prevention estimates the number of medical tourists each year is likelier in the millions. Mexico is a common destination for dental and plastic surgery. There isn't firm data on the prevalence of travel for embryonic stem cell injections, but studies have shown the dangers of undergoing the unproven treatment . Several websites promote what they say are cheap, safe and legal options for injections in Mexico.

Hospitals often refer people with abscessus infections to the National Jewish Health’s mycobacterial and respiratory infections division, where Daley is chief. He and other researchers published their findings Thursday afternoon in the CDC’s Morbidity and Mortality Weekly Report. 

The abscessus bacterium can cause infections – often in the skin or lungs – that are difficult to treat, even with antibiotics. They fester in open wounds or injections. Infections of this type are often caused by medical devices that haven’t been properly disinfected. They can cause boils and pus-filled cysts, according to the CDC. Other symptoms of infection are fever, chills and muscle aches. Infections are associated with cosmetic surgeries. 

In spring 2023, Daley saw an Arizona man in his 60s with an abscessus bone and joint infection on his right elbow after he'd gone for embryonic stem cell injections at a clinic in the Mexican state of Baja California the previous year. 

In October 2022, a Colorado woman in her 30s traveled to a different Baja clinic to get embryonic stem cells injected into her spine to treat multiple sclerosis. She developed headaches and fevers similar to meningitis, an infection that inflames fluid and membranes around the brain and spinal cord. After being treated at the University of Colorado in Aurora later that year, she was referred to National Jewish Health. 

National Jewish Health treated a third case, a Colorado man in his 60s, who received stem cell injections in his knees for osteoarthritis in October 2022 in Guadalajara, an urban hub in central-western Mexico. He subsequently developed infections in both knees. 

Researchers found all three patients had received stem cell injections. They then worked to sequence the bacterium’s genetic material. In the cases of the Arizona man and the Colorado woman who had received injections in Baja, they found the same rare sub-species of the bacterium. The Baja clinics were 167 miles apart. 

The details for the third case, the Colorado man, remain unclear. Daley said cultures for his bacterium strain weren’t saved by Mexican officials. 

Health officials with Colorado, Arizona and the CDC contacted health authorities in Mexico, where staff said they weren’t aware of the infections, Daley said. It doesn’t appear there’s any investigation into a possible outbreak, he added. 

Over a year-and-a-half after their treatments, all three patients are still in ongoing treatment for their infections. Daley said they are on a combination of antibiotics commonly used to treat pneumonia and leprosy.

Doctors are searching for additional cases of patients who may have developed infections after stem cell injections. Daley said it makes sense why Americans might opt for cheaper options abroad, but people should have a "buyer beware" notice.

“We understand the pressure to do it,” Daley said. “But it comes with risks that I don't think people understand.”

  • Election 2024
  • Entertainment
  • Newsletters
  • Photography
  • Personal Finance
  • AP Investigations
  • AP Buyline Personal Finance
  • AP Buyline Shopping
  • Press Releases
  • Israel-Hamas War
  • Russia-Ukraine War
  • Global elections
  • Asia Pacific
  • Latin America
  • Middle East
  • Election Results
  • Delegate Tracker
  • AP & Elections
  • Auto Racing
  • 2024 Paris Olympic Games
  • Movie reviews
  • Book reviews
  • Personal finance
  • Financial Markets
  • Business Highlights
  • Financial wellness
  • Artificial Intelligence
  • Social Media

About 4 in 10 Americans see China as an enemy, a Pew report shows. That’s a five-year high

Flags of the U.S and China sit in a room where U.S. Secretary of State Antony Blinken meets with China's Minister of Public Security Wang Xiaohong at the Diaoyutai State Guesthouse, Friday, April 26, 2024, in Beijing, China. (AP Photo/Mark Schiefelbein, Pool)

Flags of the U.S and China sit in a room where U.S. Secretary of State Antony Blinken meets with China’s Minister of Public Security Wang Xiaohong at the Diaoyutai State Guesthouse, Friday, April 26, 2024, in Beijing, China. (AP Photo/Mark Schiefelbein, Pool)

  • Copy Link copied

WASHINGTON (AP) — About 4 in 10 Americans now label China as an enemy, up from a quarter two years ago and reaching the highest level in five years, according to an annual Pew Research Center survey released Wednesday.

Half of Americans think of China as a competitor, and only 6% consider the country a partner, according to the report. The findings come as the Biden administration is seeking to stabilize U.S.-China relations to avoid miscalculations that could result in clashes, while still trying to counter the world’s second-largest economy on issues from Russia’s war in Ukraine to Taiwan and human rights.

Secretary of State Antony Blinken and Treasury Secretary Janet Yellen have both recently visited China in the administration’s latest effort to “responsibly” manage the competition with Beijing. Despite those overtures, President Joe Biden has been competing with former President Donald Trump, the presumptive Republican nominee in November’s election, on being tough on China .

The Pew report, which is drawn from an April 1-7 survey of a sample of 3,600 U.S. adults, found that roughly half of Americans think limiting China’s power and influence should be a top U.S. foreign policy priority. Only 8% don’t think it should be a priority at all.

FILE - A group of people, including many from China, walk along the wall after crossing the border with Mexico to seek asylum, Oct. 24, 2023, near Jacumba, Calif. Beijing and Washington have quietly resumed cooperation on the deportation of Chinese immigrants who are in the U.S. illegally, as the two countries are reestablishing and widening contacts following their leaders' meeting in California late 2023. (AP Photo/Gregory Bull, File)

For the fifth year in a row, about eight in 10 Americans report an unfavorable view of China, the Pew report said.

“Today, 81% of U.S. adults see the country unfavorably, including 43% who hold a very unfavorable opinion. Chinese President Xi Jinping receives similarly negative ratings,” the report said.

About eight in 10 Americans say they have little or no confidence in Xi to do the right thing regarding world affairs. About 10% said they have never heard of him.

American attitudes toward China have turned largely critical after the U.S. launched a trade war against China in 2018 and since the emergence of COVID-19, which was first reported in China. Beijing’s human rights record, its closeness to Russia and its policies toward Taiwan and Hong Kong also have left Americans with negative views of the country, according to Pew’s previous analyses.

At the same time, the U.S. government has been overt about competing with China on economic and diplomatic issues.

Following that, 42% of Americans say China is an enemy of the U.S., the highest level since 2021, when Pew began asking the question.

The share is much larger among Republicans and Republican-leaning independents, Pew said, with 59% of them describing China as an enemy, compared with 28% of Democrats and those leaning Democratic.

Older Americans, conservative Republicans and those with a sour view of the U.S. economy are more critical of China and more likely to consider the country an enemy, the report said.

“Americans also see China more negatively when they think China’s influence in the world has gotten stronger in recent years or when they think China has a substantial amount of influence on the U.S. economy,” said Christine Huang, a Pew research associate.

“Even pessimism about the U.S. economy is related to how Americans evaluate China: Those who think the economic situation in the U.S. is bad are more likely to see China unfavorably and to see it as an enemy,” she added.

Pew said a nationally representative sample of 3,600 respondents filled out online surveys and that the margin of error was plus or minus 2.1 percentage points.

research report in urdu

IMAGES

  1. Lec5: How to write your First Research Paper in Urdu

    research report in urdu

  2. How to Write Research Paper Tutorial Urdu/Hindi

    research report in urdu

  3. Introduction to Research Methodology lecture 1 in URDU/HINDI

    research report in urdu

  4. LITERATURE REVIEW in Research Methodology Explained in Urdu and Hindi

    research report in urdu

  5. Outstanding How To Write A Report In Urdu O Level Summary Of Survey

    research report in urdu

  6. Urdu Report Writing Format O Level

    research report in urdu

VIDEO

  1. 203rd Urs Mubarak of Khairpur Sachal Sarmast Reh started in 2024 Detailed Report Urdu Sindhi Tv

  2. What is Research? Research Kya Hai? in Urdu/Hindi 2020

  3. مولانا ارشد مدنی کا اتر پردیش دارالحکومت لکھنؤ میں جمعیت علماء ہند اجلاس سے خطاب اور میڈیا سے گفتگو

  4. Selecting A Research Topic (Urdu Language)

  5. How to make/ prepare research synopsis presentation in urdu and hindi

  6. INTRODUCTION TO RESEARCH

COMMENTS

  1. متْن (اردو ریسرچ جرنل)

    شش ماہی تحقیقی مجلّہ. Name: MATAN (Urdu Research Journal) ISSN (print): 2708-5724. ISSN (online): 2708-5732. Publishing year: 2020. Online publishing year: 2020. Institute: Faculty of Arts & Language. Publisher: Department of Urdu, The Islamia University of Bahawalpur. Biannual Double Blind Peer Reviewed Urdu Research Journal ...

  2. 8486 PDFs

    Explore the latest full-text research PDFs, articles, conference papers, preprints and more on URDU. Find methods information, sources, references or conduct a literature review on URDU

  3. اردو ریسرچ جرنل

    "Urdu Research Journal" is an open access refereed journal published quarterly. The Journal strives to publish work of high quality in research and literature works across the globe in Urdu language and literary theory. The aim of the journal is to provide high quality research material in Urdu for scholars and researchers.

  4. (Urdu)Research Papers

    Masalein. Daryaft ,Vol:9Annual Research Journal. (HEC Approved) National University of Modern Languages. 2009. 6. Ghalib Aur Ghamgeen Ki Maraslat. Me'yar," I''Department of Urdu, International Islamic University. 2009.

  5. Journal of Research (Urdu), BZU

    news / October 09, 2023. Vice Chancello, Bahauddin Zakariya University, Multan has reconstituted the Editorial Board of the Journal of Research (Urdu), vide notification no. Gen/R-2/10478, dated 06-10-2023. As per the notification, The Chairperson, Department of Urdu, BZU Multan has been appointed as the new Edito...

  6. Research Report Writing [Urdu/Hindi]

    How to write a research report (thesis, dissertation, journal article, conference paper, etc.). A lecture in a workshop for college teachers organized by the...

  7. 8387 PDFs

    Explore the latest full-text research PDFs, articles, conference papers, preprints and more on URDU. Find methods information, sources, references or conduct a literature review on URDU

  8. 8013 PDFs

    Explore the latest full-text research PDFs, articles, conference papers, preprints and more on URDU. Find methods information, sources, references or conduct a literature review on URDU

  9. PDF Design in Urdu Research: an Analysis of Urdu Theses of Pakistani

    universities are also coping with this challenge. Research in Urdu literature has been carried out in most of these universities. A huge number i.e. 4374 dissertations were written till 2008 and this work is still in progress. (Hashmi, 2008) The research design followed in Urdu research seems different from that of other social sciences.

  10. Urdu How To Write A Research Paper Or Thesis مقالہ کیسے لکھیں

    Urdu How To Write A Research Paper Or Thesis مقالہ کیسے لکھیں؟) - آسٹریلین اسلامک لائیبریری) || Australian Islamic Library Bookreader Item Preview remove-circle Share or Embed This Item. Share to Twitter. Share to Facebook. Share to Reddit. Share to Tumblr. Share to Pinterest ...

  11. A hybrid dependency-based approach for Urdu sentiment analysis

    In conclusion, this research paper introduces a novel framework for concept-level sentiment analysis of Urdu language data sourced from social media platforms. ... LR, and MLP, achieving an ...

  12. What is Research Report

    This video cover the concept of Research Report | Format & Steps of Research Report in Hindi / Urdu.Meaning of Research Report.Definition of Research Report....

  13. What is Research Report Urdu Lecture

    Video Lectures Book AvailableWhatsApp no: 03109316585پیڈاگوجی سمجھنے کےلیے بہترین کتاب جس میں انتہائی آسان طریقے سے سمجھایا گیا ہے ...

  14. Recognition of Urdu sign language: a systematic review of the machine

    Research articles that do not report quantitative outcomes of machine learning. Research articles that only uses gesture-based, character bases, or EMG signal-based as a dataset to recognize Urdu Sign Language. Research articles that do not use either gesture- based, character-based, or EMG signal-based as a dataset to recognize Urdu Sign ...

  15. SentiUrdu-1M: A large-scale tweet dataset for Urdu text sentiment

    Low-resource languages are gaining much-needed attention with the advent of deep learning models and pre-trained word embedding. Though spoken by more than 230 million people worldwide, Urdu is one such low-resource language that has recently gained popularity online and is attracting a lot of attention and support from the research community. One challenge faced by such resource-constrained ...

  16. Efficient Urdu Caption Generation using Attention based LSTM

    Recent advancements in deep learning have created many opportunities to solve real-world problems that remained unsolved for more than a decade. Automatic caption generation is a major research field, and the research community has done a lot of work on it in most common languages like English. Urdu is the national language of Pakistan and also much spoken and understood in the sub-continent ...

  17. Research Paper In Urdu

    That is why the company EssaysWriting provides its services. We remove the responsibility for the result from the clients and do everything to ensure that the scientific work is recognized. Toll free 1 (888)499-5521 1 (888)814-4206. REVIEWS HIRE.

  18. How To Write Research Paper In Urdu

    Connect with one of the best-rated writers in your subject domain. Courtney Lees. #25 in Global Rating. ID 12417. Research Paper. 4.8/5. 15 Customer reviews. 4.8/5.

  19. Research: Negotiating Is Unlikely to Jeopardize Your Job Offer

    In aggregate, candidates perceived the likelihood of jeopardizing a deal as 33% higher compared to managers (approximately 4.6 vs 3.5). The results were consistent among both men and women and ...

  20. (PDF) A Systematic Study of Urdu Language Processing its ...

    Very limited research work has been done in Urdu or Roman Urdu languages. Whereas, Hindi/Urdu is the third largest language in the world. In this paper, we focus on the sentiment analysis of ...

  21. Research Paper In Urdu

    Research Paper In Urdu: Paper Type. ID 10820. Experts to Provide You Writing Essays Service. You can assign your order to: Basic writer. In this case, your paper will be completed by a standard author. It does not mean that your paper will be of poor quality. Before hiring each writer, we assess their writing skills, knowledge of the subjects ...

  22. S&T Releases Market Survey Report for Non-Detonable Training Aids for

    S&T's National Urban Security Laboratory (NUSTL)—in conjunction with Johns Hopkins Applied Physics Laboratory—administered the Non-Detonable Trainings Aids for Explosive Detection Canines Market Survey Report, which provides information on 12 non-detonable training aid products ranging in price from $15 to $550.This report is based on information gathered from manufacturer and vendor ...

  23. How closely do Americans follow local news?

    The share of Americans who say they follow local news very closely now stands at 22% - a decline of 15 percentage points since 2016, when 37% of U.S. adults said the same. Most U.S. adults (66%) still say they follow local news at least somewhat closely, although this number is also down. Roughly eight-in-ten adults (78%) followed local news ...

  24. National Labs Guide Critical AI, Energy Storage, And Grid Research

    Artificial intelligence or AI "will drive unprecedented innovation," says Steven Ashby, director of the Pacific Northwest National Laboratory. "An AI-empowered grid could make autonomous ...

  25. CNBC/NRF Retail Monitor, powered by Affinity Solutions April 2024 Report

    April 2024 Retail Monitor. The CNBC/NRF Retail Monitor provides a first look at how retail sales perform each month. The Retail Monitor leverages Affinity Solutions' data from more than 140 million credit and debit cards, with nearly nine billion transactions totaling more than $500 billion in annual spending, to measure the monthly and ...

  26. Research Paper In Urdu

    Research papers can be complex, so best to give our essay writing service a bit more time on this one. Luckily, a longer paper means you get a bigger discount! Hire a Writer. Elliot Law. #19 in Global Rating. ID 11622. Alexander Freeman. #8 in Global Rating. Toll free 1 (888)499-5521 1 (888)814-4206.

  27. Research Paper In Urdu

    Research Paper In Urdu, Cheap School Essay Proofreading Websites Us, Literary Analysis Of Parker Back By Flannery O Connor, Software Engineering Thesis, Resume Models For Marketing Jobs, Application Of Critical Thinking To Corporate Social Responsibility, Informational Write Organizer Informational Write Organizer ...

  28. (PDF) Urdu Studies

    Abstract. It has been published for the Department of Urdu, Jai Prakash University, Chapra (India), and it contains research papers written by Prof. Shahnaz Nabi, Zehra Mehdi, Dr. Najeeba Arif, Dr ...

  29. Beware of infections from stem cell injections in Mexico, report

    Stem cell injections in Mexico can be hazardous. Report identifies US victims. Health experts are alerting travelers considering medical care abroad about a trio of recent drug-resistant bacterial ...

  30. About 4 in 10 Americans see China as an enemy, a Pew report shows. That

    Updated 6:41 AM PDT, May 2, 2024. WASHINGTON (AP) — About 4 in 10 Americans now label China as an enemy, up from a quarter two years ago and reaching the highest level in five years, according to an annual Pew Research Center survey released Wednesday. Half of Americans think of China as a competitor, and only 6% consider the country a ...