Open Access
Issue
Wuhan Univ. J. Nat. Sci.
Volume 28, Number 2, April 2023
Page(s) 117 - 128
DOI https://doi.org/10.1051/wujns/2023282117
Published online 23 May 2023

© Wuhan University 2023

Licence Creative CommonsThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

0 Introduction

In March 2014, the "11 Chaori Bond" was declared in default due to the inability of ST Chaori to repay its debts and interest, becoming the first bond in China's history to default on its debts[1]. Since then, defaults in our credit markets have occurred from time to time. In 2020, due to the impact of external environment, the downward pressure of the domestic economy brought huge uncertainty to highly indebted companies, and the scale of credit defaults hit another record high. In 2020, annual bond defaults exceed 50 billion, becoming the highest in nearly five years[2]; in the first half of 2021 alone, the size of the bond is already close to 98.4 billion, substantially exceeding the level of the same period in 2020. On 17 August 2021, the Central Financial and Economic Commission emphasized the need to strengthen the financial rule of law and infrastructure, deepen the construction of credit system and play the fundamental role of credit in the identification, monitoring, management and disposal of financial risks[3]. In such an institutional environment, the study of corporate credit risk is undoubtedly of great relevance and policy implications.

Listed enterprises are the cornerstones of the security market in China. The quality of the enterprises, the standardization of their conduct and their financial situation will have a direct impact on the development of the Chinese securities market and the interests of investors[4]. As for the research methods of credit risk of listed companies, there are mainly traditional credit risk analysis, multivariate statistical models, and artificial intelligence-based credit risk assessment models[5]. Traditional credit risk analysis research mainly includes 5C element analysis method, LAPP (Liquidity, Activity, Profitability, Potentialities) method, and five-level classification method. This type of research method is primarily based on the experience and subjective analysis of bank experts to assess credit risk. In practice, due to the strong subjectivity, the determination of debtor default rates, default factors and their weights relies heavily on the creditors' own judgment, thus making the assessment results of credit risk lack objectivity and scientificity[6,7]. Following the traditional methods of credit risk analysis, multivariate statistical models have been widely used abroad in the assessment of credit risk, summarized as linear probability models, logistic models, probit models and discriminant analysis models[8]. With the advent of the Big Data era, neural network technology took a place by virtue of its non-normal distribution and non-linear characteristics, and was introduced into corporate credit evaluation in the 1990s, breaking the inherent limitations of traditional credit risk assessment models, further promoting the development of credit risk assessment models[9].

The greatest advantage of multivariate statistical models is their obvious explanatory nature, while the strength of neural network techniques lies in the complex combination of a large number on non-linear network layers which allow the extraction of features at various levels of abstraction from the raw data[10]. Therefore, this paper combines deep belief networks, exploratory factor analysis, confirmatory factor analysis, structural equation model with logistic distribution to measure credit risk. With the help of deep belief networks for first-level extraction, the indicators that have a significant impact on credit risk are extracted from the financial indicators, then the indicators are extracted twice by exploratory factor analysis, and thereafter the final extraction of risk measure indexes is achieved by confirmatory factor analysis; after that the risk measure indexes, structural equation model and logistic distribution are used to quantify risk measurement. From there, the potential variables affecting credit risk and the paths of influence behind them (the quantitative structural relationship between each latent variable and credit risk) are identified and a credit risk measurement model is obtained.

1 Establishment of Credit Risk Assessment Method

In the study, the data we obtained may have information redundancy and multicollinearity, so we need to extract the indexes that have a significant impact on the credit risk from the original financial indicators. We obtain the deep nonlinear structure of the input data by deep belief network (DBN) to extract the essential features implied by the input data and perform first-level extraction of the original financial indicators[11]. Then, we analyze how many common factors affect the metrics and the nature of each factor by exploratory factor analysis (EFA), extracting the metrics in the second level after the first level extraction[12]. Finally, the data were analyzed by confirmatory factor analysis (CFA) for structural validity, convergent validity, and discriminant validity to finally extract the metrics we need, i.e., risk measure indexes. Systemic structural equation model (SEM) and logictic distribution are further used to find the specific paths that risk measure indexes affect credit risk and the variables hidden behind these indexes to quantificat credit risk measurement, as shown in the flow chart in Fig. 1[13].

thumbnail Fig. 1

The flow chart of credit risk measurement model construction

1.1 Risk Measure Indexes Extraction

Let the financial indicators be X1,X2,,XN.The original risk level is divided into two categories of ST (marked with "0") and non-ST (marked with "1").

In the first step, the risk measure indexes are extracted at the first level by DBN. According to the size of the weight parameters of the neurons in each layer of the network, the n input neurons Xi1,Xi2,, Xin with the largest weights are found in reverse from the output layer to obtain the first level extraction of the risk measure indexes.

In the second step, the risk measure indexes are extracted at the second level by EFA. For the risk measure indexes extracted at the first level, the metrics with factor loadings greater than 0.7 and commonality greater than 0.4 are extracted according to the factor loadings and commonality of each metric in EFA to obtain the risk measure indexes Xt1,Xt2,,Xtn extracted at the second level.

In the third step, the final extraction of risk measure indexes is performed through CFA. For the risk measure indexes extracted at the second level, it is necessary to pass the p-value and standard loading coefficient tests in CFA. At the same time, it is necessary to meet the conditions of average variance extracted (AVE) >0.5, composite reliability (CR) >0.7, and the AVE square root value is greater than the maximum value of the absolute value of the correlation coefficient between the factors. We will use the indexes that meet all the above conditions as the final obtained risk measure indexes Xs1,Xs2,,Xsn.

1.2 Credit Risk Quantification

The common factors obtained from the CFA in the process of risk measure indexes extraction are set as ξ1,ξ2,,ξm, as latent variables to measure the credit risk η of listed companies, their corresponding risk measure indexes are set as Xs1,Xs2,,Xsn, as observed variables to measure the credit risk η of listed companies, and the credit risk measurement X' as the observed variable corresponding to the credit risk η.

Based on the structural equation model, the structural model and measurement model of credit risk are:

η = l 1 ξ 1 + l 2 ξ 2 + + l m ξ m + ε η (1)

X ' = b η (2)

where l1,l2,,lm,b are the regression coefficients and εη is the error term.

And the measurement model for each obvious and latent variable is:

{ X s 1 = a 11 ξ 1 + a 12 ξ 2 + + a 1 m ξ m + ε 1 X s 2 = a 21 ξ 1 + a 22 ξ 2 + + a 2 m ξ m + ε 2 X s n = a n 1 ξ 1 + a n 2 ξ 2 + + a n m ξ m + ε n (3)

where (a11a1man1anm) are the regression coefficients and ε1,ε2,,εn are the error terms.

We can obtain:

{ ξ 1 = b 11 X s 1 + b 12 X s 2 + + b 1 n X s n + ε 1 ξ 2 = b 21 X s 1 + b 22 X s 2 + + b 2 n X s n + ε 2 ξ m = b m 1 X s 1 + b m 2 X s 2 + + b m n X s n + ε m (4)

where (b11b1nbm1amn) are the regression coefficients and ε1,ε2,,εm are the error terms.

Combining equations (1), (2), (4) yields our risk measurement, i. e.

X ' = b η = b [ l 1 ( b 11 X s 1 + + b 1 n X s n + ε 1 ) +               + l m ( b m 1 X s 1 + + b m n X s n + ε m ) ]        = b [ ( l 1 b 11 + + l m b m 1 ) X s 1 +                 + ( l 1 b 1 n + + l m b m n ) X s n + ( l 1 ε 1 + + l m ε m ) ]         = b c 1 X s 1 + b c 2 X s 2 + + b c n X s n + ε         = h 1 X s 1 + h 2 X s 2 + + h n X s n + ε (5)

where h1,h2,,hn are the regression coefficients and ε is the error term.

Finally, the credit risk level X is obtained by logistic distribution.

{ p ( X = 1 | X ' ) = 1 / ( 1 + e - X ' )                                       p ( X = 0 | X ' ) = 1 - p ( X = 1 | X ' ) = e - X ' / ( 1 + e - X ' ) (6)

where X is divided into two classification results ST (marked with "0") and non-ST (marked with "1").

2 Experiments

2.1 Data and Data Pre-Processing

Considering data availability and sample representativeness, this paper selects all non-ST companies in the CSI 300 (300 companies) and all ST companies in listed companies (190 companies), totaling 490 companies, as research subjects. The data set is derived from all financial information of these 490 companies in the Wind database for the period Q1 2019 to Q1 2021, totaling 200 financial metrics. Since each company had missing data, including 21 financial indicators with all missing data in individual quarters and 207 companies with some financial indicators completely missing, we interpolated by the third spline interpolation method and the K-nearest neighbor interpolation method based on generalized cosine.

2.2 Credit Risk Measure Indexes Extraction

2.2.1 First-level extraction

In our DBN, we select 55% of the data as the training samples and 45% of the data as the testing samples, with 200 financial indicators as the visual layer neurons, and the original risk level is divided into two categories of ST and non-ST. We connect the last hidden layer to the BP neural network and use the two classification results ST (marked with "0") and non-ST (marked with "1") as the output layer neurons. Then we have to determine the quantity of the network's hidden layers and the nodes of each hidden layer by trial-and-error, together with the accuracy of the testing sample to judge the structural generalization performance, and the results are shown in Fig. 2.

thumbnail Fig. 2

The accuracy of the testing sample

In setting the nodes of the latest hidden layer, we fix the quantity of neurons in the latest hidden layer to 12, taking into account that the risk assessment level I index is 12. From 200 neurons to 12 neurons, we set 150, 100, 50 as the number of nodes in the previous hidden layer for a wide range of search in order to reduce the loss of information transmitted in the middle. The experiment found a high accuracy rate around 100 and a low accuracy rate below 50 or above 150. So we set the top hidden layer's node number to 120, 80, 60, 40, and find the best one for the foremost hidden layer from them. And so on, find the best count of nodes of the topmost hidden layer among the three hidden layers and four hidden layers. The results in Fig. 2 show that the recognition performance of the network containing the three-layer hidden layer structure is higher than that of the two-layer hidden layer network structure and the four-layer hidden layer network structure. Because the comparison in the network structure containing two hidden layers found that the best network recognition performance under the structure of the first hidden layer with the nodes number of 60 and 80, therefore, in the network structure containing three hidden layers, a hidden layer was added as the first layer based on the nodes of the second and third hidden layers at 60 and 12, and the first hidden layer nodes were set at 140, 120, 100 and 80 respectively. The experimental results show that the performance of the network is optimal for a structure with 120 nodes in the first hidden layer. Therefore, we finally chose a DBN containing three hidden layers with 120, 60 and 12 nodes for the first-level extraction of the risk measure indexes.

In the first-level extraction of risk measure indexes, we use a top-down approach, starting from the output layer to find the key neurons in the third hidden layer, seeking the six neurons in the third hidden layer that have a significant impact (the highest weight) on each neuron in the output layer, and locating the eight neurons with the highest frequency from them, finally looking in turn to the key neurons in the input layer, i.e. the significant metrics that affect credit risk. Finally we find its corresponding indicators in the input layer, take the 20 indicators with the highest weights, and the results are shown in Table 1.

As shown in Table 1, the weight of debt-to-long capital ratio (X1) is 60.68, which is much higher than the weights of other indicators. This is because the debt-to-long capital ratio reflects the long-term capital structure of an enterprise. The smaller the indicator, the lower the degree of debt capitalization of an enterprise, the lower the pressure of long-term debt repayment, the lower the risk of default, and the relatively smaller its credit risk. We can also see that these 20 indicators are more comprehensive in terms of profitability, solvency, operating capacity and development capacity of the company.

Table 1

Ranking of indicators

2.2.2 Second-level extraction

Deep belief networks narrow the scope for extracting risk measure indexes. Before formally constructing a risk measurement model, we need to find the true risk measure indexes using EFA and CFA.

In the EFA, by first extracting indicators with factor loadings greater than 0.7, removing indicators which has the similar loading on different factors or falls on a specific factor but have factor loadings less than 0.7, and deleting indicators with communality less than 0.4, we obtained 11 indicators, as shown in Table 2.

Then the 11 obtained indicators are again subjected to EFA for finding the latent variables corresponding to each indicator, and the secondary extraction of risk measure indexes is performed.

From Table 3, we can see that the degree of freedom (df) is 55, Kaiser-Meyer-Olkin measure of sampling adequacy KMO=0.667>0.6 and the observed value of Bartlett's sphericity test statistic is 33 974.601 with a significance (Sig.) of 0.000<0.001, which indicates that these 11 indicators are highly correlated and also indicates that there is some overlap in the credit risk information of listed companies reflected among the 11 indicators, so we need to use EFA to find the potential variables behind these indicators.

To make the extracted latent variables, i.e., factors, more realistic, according to the extraction principle of feature value > 1, the 11 indicators were dimensionalized, meanwhile, the feature value and cumulative contribution rate were calculated. The results are shown in Table 4: there are 5 factors conforming to the extraction principle, and the cumulative contribution of these 5 factors is 80.28%, that is, the variance explained by these 5 factors accounts for 80.28% of the total variance, there is not much information lost by using these 5 factors to reflect the credit risk of listed companies, so these 5 factors can reflect the credit risk of listed companies comprehensively.

Classifying the 11 indicators according to higher loadings, it can be seen from Table 5 that the factor score formula for the first principal factor is:

F 1 = 0.984 X 20 + 0.979 X 7 + 0.953 X 6 + 0.006 X 4 + 0.020 X 17 + 0.051 X 9 + 0.021 X 3 - 0.012 X 12 - 0.043 X 16 + 0.006 X 15 + 0.003 X 8

F1 has large loadings of 0.953, 0.979 and 0.984 on three indicators X6, X7 and X20, respectively. As all three indicators reflect the profitability of listed companies, F1 is defined as profitability in this paper.

The factor score formula for the second principal factor is :

F 2 = 0.013 X 20 + 0.024 X 7 - 0.005 X 6 + 0.967 X 4 + 0.961 X 17 + 0.097 X 9 + 0.318 X 3 - 0.009 X 16 - 0.023 X 15 - 0.024 X 8

F2 is loaded on 2 items, X4 and X17, with loading values of 0.96 or more. Both indicators reflect the cash flow of listed companies, so F2 is defined as cash flow.

The factor score formula for the third principal factor is:

F 3 = 0.024 X 20 + 0.031 X 7 + 0.030 X 6 + 0.180 X 4 + 0.205 X 17 + 0.931 X 9 + 0.860 X 3 + 0.028 X 12 + 0.024 X 16 + 0.049 X 15 + 0.052 X 8

The loadings of F3 in X9 and X3 are larger, one exceeds 0.9 and one is above 0.85. These 2 indicators can reflect the solvency of listed companies, so F3 is defined as solvency.

The factor score formula for the fourth principal factor is:

F 4 = 0.019 X 20 + 0.017 X 7 + 0.010 X 6 + 0.012 X 17 - 0.005 X 9 + 0.012 X 3 + 0.793 X 12 - 0.789 X 16 + 0.007 X 15 + 0.005 X 8

F4 loads more on X12 and X16, the former is 0.793, reflecting the profitability and operation management level of listed companies, the latter is -0.789, reflecting the short-term solvency of listed companies and reflecting the goal of listed companies' operation pursuit, so here F4 is defined as the operation level.

The factor score formula for the fifth principal factor is:

F 5 = 0.002 X 20 + 0.002 X 7 + 0.001 X 6 + 0.002 X 17

+ 0.017 X 9 - 0.023 X 3 + 0.006 X 12 + 0.004 X 16 + 0.737 X 15 - 0.732 X 8

The loadings of F5 on X15 and X8 are large, and their absolute values are above 0.7. These two indicators can reflect the development ability of listed companies, so F5 is defined as development ability. According to the results of the aforementioned factor analysis, the potential variables corresponding to these 11 indicators are these 5 main factors, and the final extraction of the risk measure indexes can be discussed through these 5 main factors.

Table 2

Factor loadings and commonality of indicators

Table 3

KMO and Bartlett's sphericity test

Table 4

Total variance explained

Table 5

Rotated component matrix

2.2.3 Final extraction

According to Occam's Razor, the "simple and effective principle", our model should be as simple as possible. In order to find indicators that can portray the degree of risk more quickly and concisely, we performed validity analysis on the data through CFA on the basis of EFA, and then further removed the inappropriate variables to achieve the final extraction of risk measure indexes.

It can be seen from Table 6 that when X12 measures the operation level, the absolute value of its standardized load coefficient is 0.254<0.6, which means that the measurement relationship is weak. When X16 measures the operation level, the standardized load coefficient does not show significant (p=0.378>0.05), indicating that there is no significant measurement relationship. The absolute values of the standardized load coefficients of X8 and X15 are 0.52 and 0.151 respectively, both less than 0.6, indicating that the measurement relationship is weak.

AVE and CR are used for convergent validity analysis. As can be seen from Table 7, the CR corresponding to operation level is less than 0.7. The AVE corresponding to development ability is less than 0.5, and the CR is less than 0.7. These imply that the aggregated validity of the data for this analysis is poor, and that the operation level and development capacity needs to be removed for further analysis. Combining Table 6 and Table 7, it can be seen that the observed variables corresponding to the operation level and development capacity need to be deleted.

After deleting X8, X12, X15, and X16, we performed CFA again on the remaining indicators, and the results were as follows. For the measurement relationship, the absolute values of the standardized load coefficients were all greater than 0.6 and appeared significant, implying a good measurement relationship. In terms of AVE and CR, all three factors corresponded to AVE values greater than 0.5, and all CR values were higher than 0.7, implying that the data of this analysis had good convergent validity. From the analysis of Table 8 for discriminant validity, for profitability, the AVE square root value is 0.96, which is greater than the maximum value of the absolute value of the inter-factor correlation coefficient of 0.07, implying a good discriminant validity. Similarly, it can be seen that solvency and cash flow also have good discriminant validity.

Table 6

Factor loading coefficient

Table 7

AVE and CR

Table 8

Pearson correlation coefficient and square root of AVE

2.3 Risk Measurement

2.3.1 SEM assumptions

Through the extraction of risk measure indexes, we found that profitability, cash flow and solvency are latent variables to measure the credit risk of listed companies, and their corresponding risk measure indexes are observed variables. The observed variable corresponding to credit risk is the risk measurement (X'). This is assumed as follows:

H1: Profitability has a significant impact on the credit risk of listed companies and each indicator of profitability indirectly affects the credit risk of listed companies through its impact on profitability;

H2: Cash flow has a significant impact on the credit risk of listed companies and each indicator of cash flow indirectly affects the credit risk of listed companies through its impact on cash flow;

H3: Solvency has a significant impact on the credit risk of listed companies, and each indicator of solvency indirectly affects the credit risk of listed companies through its impact on solvency.

2.3.2 Credit risk path analysis and risk measurement

In order to further verify the rationality of the model and the impact of each indicator, the SEM of credit risk for listed companies is constructed based on the above assumptions, and its impact path is shown in Fig. 3.

thumbnail Fig. 3

Standardized path diagram of SEM

In Fig. 3, we can see that the exogenous variables include profitability, cash flow, and solvency, which are set as ξ1,ξ2,ξ3 . The endogenous variable is credit risk, set as η. Based on the SEM principle, the mathematical expression of the theoretical model can be obtained.

The structural and measurement models for credit risk are:

η = 0.39 ξ 1 + 0.061 ξ 2 + 0.094 ξ 3 X ' = 0.642 η

As can be seen from the model, profitability, solvency and cash flow all have different degrees of influence on the credit risk of listed companies. Among them, profitability has the most significant impact with a regression coefficient of 0.39, while solvency and cash flow have the second most significant impact with regression coefficients of 0.094 and 0.061, respectively.

Profitability is measured by the following model:

X 7 = 0.978 ξ 1 ,   X 6 = 0.905 ξ 1 ,   X 20 = 0.994 ξ 1

The metric model for profitability is:

ξ 1 = 1 0.978 1 0.978 + 1 0.905 + 1 0.994 X 7

+ 1 0.905 1 0.978 + 1 0.905 + 1 0.994 X 6

+ 1 0.994 1 0.978 + 1 0.905 + 1 0.994 X 20

= 0.3263 X 7 + 0.3526 X 6 + 0.3211 X 20

Among the profitability, the most significant impact is the return on total assets with a path coefficient of 0.994. This indicates that the return on total assets is the key to corporate profitability. Combined with the structural model of credit risk it is clear that hypothesis H1 holds, i.e. profitability has a significant impact on the credit risk of listed companies and that the indicators of profitability indirectly affect the credit risk of listed companies through their impact on profitability.

Cash flow is measured by the following model:

X 4 = 0.952 ξ 2 ,   X 17 = 0.987 ξ 2

The metric model for cash flow is:

ξ 2 = 1 0.952 1 0.952 + 1 0.987 X 4 + 1 0.987 1 0.952 + 1 0.987 X 17 = 0.5090 X 4 + 0.4910 X 17

From the perspective of cash flow, the path coefficient of cash to meet invest needs is larger, which is 0.987. The larger the indicator is, the higher the self-sufficiency rate of the enterprise's capital, and the stronger the enterprise's ability to maintain production and operation. Combined with the structural model of credit risk it is easy to know that hypothesis H2 holds, that is, cash flow has a significant impact on the credit risk of listed companies, and each indicator of cash flow indirectly affects the credit risk of listed companies through the impact on cash flow.

Solvency is measured by the following model:

X 9 = 0.697 ξ 3 ,   X 3 = ξ 3

The metric model for solvency is:

ξ 3 = 1 0.697 1 0.697 + 1 X 9 + 1 1 0.697 + 1 X 3 = 0.5893 X 9 + 0.4107 X 3

In terms of solvency, the path coefficients for corporate free cash flow per share and net cash from operating activities/current liability are 0.697 and 1, respectively. Net cash from operating activities/current liability is more heavily weighted and the ratio provides a cash flow perspective on a company's ability to pay off short-term liabilities in the current period. Therefore, hypothesis H3 holds that solvency has a significant impact on the credit risk of listed companies and that the solvency indicators indirectly affect the credit risk of listed companies through their impact on solvency.

In summary, the risk measurement is:

X ' = 0.0248 X 3 + 0.0199 X 4 + 0.0883 X 6 + 0.0817 X 7 + 0.0356 X 9 + 0.0192 X 17 + 0.0804 X 20

The risk level can be determined:

p ( X = 1 | X ' ) = 1 / ( 1 + e - X ' ) p ( X = 0 | X ' ) = 1 - p ( X = 1 | X ' ) = e - X ' / ( 1 + e - X ' )

The standardised coefficient estimates, standard errors, critical ratios and p-values for the main SEM pathways can be seen in Table 9. In the measurement model, it can be seen that the regression coefficients of the observed variables corresponding to each latent variable are all above 0.5 and all are significant at the 0.05 confidence level. In the structural model, the regression coefficients between the latent variables also all passed the significance test at the 0.05 confidence level. This indicates that the model meets the basic fit criteria and is identifiable.

Table 9

Summary table of model regression coefficients

2.4 Performance Evaluation

2.4.1 Statistical evaluation

In this section, to further validate the soundness of the model, we evaluate the structural equation model based on the model fit index and the following table shows the results.

The SEM fit index test results are shown in Table 10, and the fit criteria are basically met, except for χ² and χ²/df, which are not ideal. However, these two indicators are susceptible to sample size and tend to reject all well-fitting models when the sample size is large, so the model is not revised here.

Table 10

Fit indices in the proposed model

2.4.2 Assessment on testing samples

In this section we assess the credit risk of the data in the testing samples with a risk metric model. The testing samples accounts for 45% of the original sample, with a total of 1 984 samples and 13 892 data, with 2 039 missing data, and a missing rate of 14.68%. Among them, the data of cash to meet invest needs indicator is missing for 5 quarters, and the data of net cash from operating activities/current liability indicator is missing for 27 companies. We interpolate these missing data by data preprocessing. It can be seen that incomplete data information is destined to lead to a lack in our evaluation, and the results are shown in Table 11.

As can be seen from Table 11, 80 of the non-defaulted samples were correctly assessed, 641 of the defaulted samples were correctly assessed, and the overall prediction accuracy of the risk metric model was (802+641)/1984= 72.73%. In the evaluation, the type Ⅰ error rate, i.e., the error rate of determining the non-defaulted sample as the defaulted sample, can be found to be 408/(802+408)=33.72%. The type Ⅱ error rate, i.e., the error rate of determining the defaulted sample as the non-defaulted sample, is 133/(133+641)=17.18%.

Table 11

Assessment on testing samples

2.4.3 Credit risk forecasting

Finally, we forecast credit risk with data for a total of 4 quarters from Q2 2021 to Q2 2022 with 454 companies, including 300 non-ST companies and 154 ST companies, for a total of 2 270 samples, of which 47 companies have all missing data for individual indicators and 2 quarters have all missing data for the cash to meet invest needs indicator, which we interpolate by the same interpolation and use the interpolated data for forecasting, and the results are shown in Table 12.

The results in Table 12 show that the overall prediction accuracy of the risk metric model was (1 075+576)/2 270=72.73%. The Type Ⅰ error rate was 432/(1 075+432)=28.67% and the Type Ⅱ error rate was 4.16% lower than the Type Ⅰ error rate, at 187/(187+576)=24.51%. There were a total of 1 651 samples of prediction data, of which 619 were predicted correctly. Of the non-defaulted samples, a total of 432 samples were judged incorrectly and judged to be defaulted. Of the defaulted samples, a total of 187 samples were determined to be incorrect and judged to be non-defaulted.

Table 12

Credit risk forecast

3 Conclusion

This paper uses financial data from Q1 2019 to Q2 2022 as the research data set for construction and evaluation of a novel credit risk measurement model. The measure indexes for credit risk of listed companies are obtained through DBN, EFA and CFA in turn. Then the risk measurement is obtained with SEM and logistic distribution. Finally, the performance evaluation of the risk measurement is made with statistical evaluation, assessment on testing samples and credit risk forecasting, resulting in the following conclusions:

1) The credit risk of listed companies is mainly influenced by profitability, cash flow and solvency. Profitability has the greatest influence with a regression coefficient of 0.39, cash flow has the least influence with a regression coefficient of 0.061, and solvency is in the middle with a regression coefficient of 0.094.

2) Among profitability, the return on assets has the highest impact weight, annualized return on assets is the second highest, and the return on total assets is the lowest, with impact weights of 0.352 6, 0.326 3, and 0.321 1, respectively.

3) Cash flow is mainly reflected by the net profit cash cover and the cash to meet invest needs, which has an impact weighting of 0.509 0 and 0.491 0, respectively.

4) Within solvency, the impact of corporate free cash flow per share is significant with an impact weighting of 0.589 3, followed by net cash from operating activities/current liability with an impact weighting of 0.410 7.

References

  1. Yang G, Wang R, Wang S N, et al. Research on the impact of bank competition on credit risk of listed companies [J]. China Soft Science, 2021(10): 103-114(Ch). [Google Scholar]
  2. Ma X, Wei C,Han J. Credit risk assessment of Chinese listed companies based on SVM improved by shuffled frog leaping algorithm [C]//2021 33rd Chinese Control and Decision Conference (CCDC). New York: IEEE Press, 2021: 2462-2467. [Google Scholar]
  3. Wang C P, Li C L. Credit risk measurement of listed companies based on modified KMV model [J]. Friends of Accounting, 2018(13): 93-99(Ch). [Google Scholar]
  4. Wang R. AHP -entropy method credit risk assessment based on Python[C]// 2021 Asia-Pacific Conference on Communications Technology and Computer Science (ACCTCS). New York: IEEE Press, 2021: 17-20. [Google Scholar]
  5. Machado M R, Karray S. Assessing credit risk of commercial customers using hybrid machine learning algorithms [J]. Expert Systems with Applications, 2022, 200: 116889. [Google Scholar]
  6. Hu X, Hu J, Chen L, et al. Credit risk assessment model for small, medium and micro enterprises based on RS-PSO-SVM integration[C]// 2021 IEEE 6th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA). New York: IEEE Press, 2021: 342-345. [Google Scholar]
  7. Tezerjan M Y, Samghabadi A S, Memariani A. ARF: A hybrid model for credit scoring in complex systems[J]. Expert Systems with Applications, 2021, 185(7): 115634. [CrossRef] [Google Scholar]
  8. Shen F, Zhao X, Kou G, et al. A new deep learning ensemble credit risk evaluation model with an improved synthetic minority oversampling technique[J]. Applied Soft Computing, 2021, 98(1): 106852. [Google Scholar]
  9. Liu J M, Zhang S C, Fan H Y. A two-stage hybrid credit risk prediction model based on XGBoost and graph-based deep neural network[J]. Expert Systems with Applications, 2022, 195: 116624. [Google Scholar]
  10. Hu Y, Su J. Research on credit risk evaluation of commercial banks based on artificial neural network model[J]. Procedia Computer Science, 2022, 199: 1168-1176. [CrossRef] [Google Scholar]
  11. Zhang K, Shi S, Liu S, et al. Research on DBN-based evaluation of distribution network reliability[C]// 7th International Conference on Renewable Energy Technologies (ICRET 2021). Kuala Lumpur: EDP Sciences, 2021, 242: 03004. [Google Scholar]
  12. Zhu J. Research of enterprise financial management capability system based on EFA method and intelligent data clustering model[C]// 2021 2nd International Conference on Smart Electronics and Communication (ICOSEC). New York: IEEE Press, 2021: 1342-1345. [Google Scholar]
  13. Liebana-Cabanillas F, Marinkovicet V, Kalinic Z, et al. Predicting the determinants of mobile payment acceptance: A hybrid SEM-neural network approach[J]. Technological Forecasting and Social Change, 2018, 129: 117-130. [CrossRef] [Google Scholar]

All Tables

Table 1

Ranking of indicators

Table 2

Factor loadings and commonality of indicators

Table 3

KMO and Bartlett's sphericity test

Table 4

Total variance explained

Table 5

Rotated component matrix

Table 6

Factor loading coefficient

Table 7

AVE and CR

Table 8

Pearson correlation coefficient and square root of AVE

Table 9

Summary table of model regression coefficients

Table 10

Fit indices in the proposed model

Table 11

Assessment on testing samples

Table 12

Credit risk forecast

All Figures

thumbnail Fig. 1

The flow chart of credit risk measurement model construction

In the text
thumbnail Fig. 2

The accuracy of the testing sample

In the text
thumbnail Fig. 3

Standardized path diagram of SEM

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.