Exploring electrocoagulation through data analysis and text mining perspectives

Ersin Aytac

Zonguldak B lent Ecevit University, Department of Environmental Engineering, Zonguldak, 67100, Turkey


This study is a bibliometric analysis of electrocoagulation with data analysis and text mining aspects. Related research was conducted to take a picture of the current state of electrocoagulation in the literature to find out the less-used wastewater, electrode, and pollutants types, to discover the common words used in article titles, to understand how many pages an average article has, to figure out if electrocoagulation has passed its prime time and to provide helpful information to the researchers in developing of their research strategies. The first part of the study was the statistical analysis of the raw data. Some valuable information such as cited reference count, publication year, number of pages, and times cited -all databases have been revealed with density plots. Then a word cloud approach was used to inspect the abstracts, the titles, and the keywords. Afterward, the abstracts were classed into two, using word embedding and a k-means algorithm. Descriptive statistics, word cloud, and sentiment analysis were performed for each cluster. Finally, a classification process was conducted depending on research areas with the decision tree algorithm. The decision tree method could not classify the data set sufficiently depending on whether the abstracts of the papers were not compatible with the research area classes or because there were too many research area categories.


bibliometric analysis; decision tree; exploratory data analysis; k-means clustering; sentiment analysis; t-SNE

Full Text:

 Subscribers Only