PREDICTION OF HEART DISEASE USING ARTIFICIAL NEURAL NETWORK

Heart disease is increasing rapidly due to number of reasons. If we predict cardiac arrest (dangerous conditions of heart) in the early stages, it will be very helpful to cured this disease. Although doctors and health centres collect data daily, but mostly are not using machine learning and pattern matching techniques to extract the knowledge that can be very useful in prediction. Bioinformatics is the real world application of machine learning to extract patterns from the datasets using several data mining techniques. In this research paper, data and attributes are taken from the UCI repository. Attribute extraction is very effective in mining information for the prediction. By utilizing this, various patterns can be derived to predict the heart disease earlier. In this paper, we enlighten the number of techniques in Artificial Neural Network (ANN). The accuracy is calculated and visualized such as ANN gives 94.7% but with Principle Component Analysis (PCA) accuracy rate improve to 97.7%.

A. Neural Networks. A single neural is also known as a perceptron. The human brain consists of more than ten billion neurons. In ANN, there are two parts of a single perceptron, one is the sum function and the other is a transfer function. Sum function is like almost every model of the ANN but there are different transfer functions with different natures according to their theories. Z = ∑ XiWi (1) After calculating the sum of the product of all inputs and weights using equation (1). The other step is transfer function. The transfer function is of different types such as sigmoid function, sign function, tan hyperbolic function, ReLU function and a step function. y = (2) Equation (2) shows the sigmoid function. Its output value remains between 0 and 1. While Sign function is just replacing any positive number to +1 and any negative number to -1 as shown in equation (3) f(x) = −1, < 0 +1, ≥ 0 Tan Hyperbolic function (4) gives the output in the form of a value from -1 and 1. It's also used for the continues value output as a signed function, but the difference is that it also gives the negative value.
tan h = e x − e −x e x + e −x ReLU function is used in the convolutional neural network a type of deep learning neural network for images. It just replaces negative values with the zero as shown in (5).
ReLU(x) = max (0, x) The basic simplest transfer function is the step function. It is used for the linear problems. As shown in equation (6) controls the y-intercepts.
f(x) = 0, < θ 1, ≥ θ (6) Perceptron is the mathematical model of the neuron. It is a single neuron connected to inputs and it has only one output. It is used to classify linear problems. It has two parts sum function and transfer function. Figure 1 Perceptron Model showing x1, x2, x3 inputs with corresponding weights w1, w2, w3. There are two parts of the perceptron one is sum function and another one is step function as the activation function. The sum function is as shown in (1) and step function is shown in (6).
B. Multi-Layer Perceptron. An artificial neural network (ANN), also known as a short "neural network" (NN) and Multi-Level Perceptron, is a mathematical model or computational model based on the neural network found in human anatomy. It is built on the analogy of a single neuron of the human brain. It has multiple perceptrons at multiple levels. Every perception has its weight which affects the value and output. A hidden layer where value is predictable and last output layer where result is generated.  E. Nonlinear Autoregressive Exogenous. Nonlinear autoregressive exogenous model (NARX) model is also like the FFNN and it's a type of RNN. The difference is that the input to hidden layer is from input layer and the output layer with some delays. It gives much more accuracy when the data is in time series like Hidden Markov's Model (HMM) does. Figure 4 shown NARX model F. Cascade-Forward Neural Networks. Cascade forward neural networks (CFNN) is like the FFNN with the only difference being that all input neurons have significantly more connections to every other neuron of every layer of the model. This research is related to heart disease prediction using ensemble technique. We use KDD and Neural Network techniques for the prediction of heart disease, so we can compare the results before PCA and after PCA. We try to find the results, so we can predict a heart disease for a patient. Also, want to prove that algorithm accuracy increases after applying PCA and the ensemble technique. The paper is drieved in following sections.
2. Related Work. In Paper [13], uses the data [19] from UCI data repository. They use a fuzzy genetic algorithm for pre-processing and then they test their data on naïve Bayes, decision List Tree, and nearest neighborhood. Accuracy rate for naïve Bayes is 96.5%, Decision Tree 99.2% and for classification via clustering 88.3%. An Intelligent Heart Disease Prediction System (IHDPS) is developed by using data mining techniques. Naive Bayes, Neural Network, and Decision Trees as a solution were proposed by Sellappan Palaniappan et al. [20]. Every technique has its own particular quality to get proper outcomes. To construct this framework, shrouded examples and relationships between them are utilized. It is online, easy to understand and expandable. To develop the multi-parametric segment with straight and nonlinear characteristics of HRV (Heart Rate Variability) a novel strategy was proposed by Heon Gyu Lee et al. [21]. To finish this, they have used a couple of classifiers e.g. Bayesian Classifiers, CMAR (Classification in perspective of Multiple Association Rules), C4.5 (Decision Tree) and SVM (Support Vector Machine).In [24], it analyzed about the desire of the coronary sickness using data mining frameworks like decision trees, Naïve Bayes, Neural Networks, gathering and Genetic Algorithm. His gave the examination of various systems used as a part of the desire of the coronary disease. As for other heart diagnosis problems, classification systems have been used for heart disease diagnosis problem, too. When the studies in the literature related with this classification application are examined, it can be seen that a great variety of methods were used which reached high classification accuracies using the dataset taken from the UCI machine learning repository. Among these, [8] ToolDiag, RA obtained 50.00% classification accuracy by using IB1-4 algorithms. [4] WEKA, RA obtained a classification accuracy of 58.50% using InductH algorithm while ToolDiag, RA reached to 60.00% with RBF algorithm. [8] Again, WEKA, RA applied FOIL algorithm to the problem and obtained a classification accuracy of 64.00%. [8] MLP+BP algorithm that was used by ToolDiag, RA reached to 65.60%. [8] The classification accuracies obtained with T2, 1R, IB1c and K* which were applied by WEKA, RA are 68.10%, 71.40%, 74.00% and 76.70%, respectively. [8] Robert Detrano used logistic regression algorithm and obtained 77.0% classification accuracy. The result of this fuzzy expert system in 79% as a well as the expert did. Moreover, Cheung utilized C4.5, Naive Bayes, BAND and BNNF algorithms and reached the classification accuracies 81.11%, 81.48%, 81.11% and 80.96%, respectively [8].
3.Methodology. As we are following the model of the KDD which involves the collection of data sets, its preprocessing, building patterning matching classifier model with ANN. After training them we will ensemble them to get the better decision we use Weka tool for the implementation of the methodology as shown in the figure of "implementation design". Figure 5 In first step data is preprocessed for the training using PCA algorithm then the processed data is input in the ANN.

Datasets.
As we are following Heart disease diagnosis medical reports data used "cleaver land heart data" is taken from UCI data repository [16]. As shown in table 1 only 13 attributes are select for the research such as age (years), sex (1 = male; 0 = female), cp, tretbps (resting blood pressure in mm Hg), chol (cholesterol in mg/dl), fbs(fasting blood sugar), restecg (resting electrocardiographic results), oldpeak (ST depression induced by exercise relative to rest), slope(the slope of the peak exercise ST segment ), ca(number of major vessels), thal(reversible defect) and Num(Class attribute 1=heart disease; 0 for no heart disease)

Cleveland Statlog
Merge Data PCA Normalization Building ANN Testing Figure 5 Results are generated on two data sets i.e. Cleveland and Stat log, both data sets taken from the UCI repository. Results are also generated on merged data by introducing one more variable of "source" value of the source is "1" for the Cleveland and "2" for stat log.   A. Confusion Matrix.By the analysis, 48.5% of the times 0 is required and the result 0 is achieved. 7.9 % of records are achieved when required is 0 and the result comes 1. Also, there are 5.6% records when 1 is desired and 0 comes and 38.0 % when 1 is desired and 1 is actual. So, the accuracy becomes 86.5%.

True Positive True Negative
Predicted Positive

True Positive True Negative
Predicted Positive

145
Also, there are 6.3% records when 1 is desired and 0 comes and 35.0 % when 1 is desired and 1 is the result. So, the accuracy becomes 82.8%.

Recurrent Neural Network
Recurrent Neural Network (RNN) is simpler than the FFNN with the difference being a loopback from the hidden layer to the input layer i.e. the output of the hidden layer again goes as an input with some delays. Also, there are 3.0% records when 1 is desired and 0 comes and 43.6 % when 1 is desired and 1 is the result. So, the accuracy becomes 94.7%.
11. NARX Model. NARX model is also like the FFNN and it's a type of the RNN the difference is that the input to hidden layer is from input layer and the output layer with some delays Only 49.5% of the times 0 is required and the result 0 is achieved. 8.9 % of records are achieved when required is 0 and the result comes 1.
12. Cascade Forward Neural Networks With PCA. After getting results we train to find if this model is giving more accuracy. We achieve higher accuracy levels by applying PCA on data and giving the model, which really increases the accuracy.
A. Confusion Matrix. The following Confusion Matrix observation only 52.8% of the times 0 is required and the result 0 is achieved. 1.0 % of records are achieved when required is 0 and the result comes 1.

True Positive True Negative
Predicted Positive

160
Also, there are 1.3% records when 1 is desired and 0 comes and 44.9 % when 1 is desired and 1 is the result. So, the accuracy becomes 97.7%.

Results
Sample of a  Conclusion. For clear understanding, results/prediction rate for each of the algorithm are summarized in a tabular form as well as graph representations and the best prediction rate obtained in each of the techniques/methodologies is summarized by studying, analyzing and performing an ensemble base technique. Different variations of ANN give different accuracy rates. This shows the accuracy of 'before Applying PCA' and 'after applying' is different i.e., before 94.7% and after applying PCA is up to 97.7%. A huge different accuracy is observed. So, we can do heart disease prediction. It can be further increased by changing the setting and making them more optimized according to each algorithm and nature of data.