Stata 17 is an integrated, multi-purpose statistical data analysis software that can meet all the needs of users for statistical data analysis, management and graphical display. The new version has DID official commands, perfect table output, new Lasso functions, discrete For functions such as selecting new commands, the Stata MP 17 for Windows version is provided this time.
StatCorp Stata 17 New Features:
On April 20, 2021, Stata Corporation officially announced that Stata 17 is online! Perhaps the measurement partners have not heated up Stata 16, and Stata has effectively launched Stata 17. If a worker wants to do a good job, he must first sharpen his tools. The Stata company is just like “there are bandits, who are as sharp as they are, and they are as sharp as grinding”, making Stata, a sharp weapon, increasingly sophisticated and handy. The fundamental reason why Stata has become the most popular econometric software is that Stata is very close to the practical application of econometrics.
So, what surprises does the new Stata 17 bring us? To sum up, there are mainly the following ten aspects, which are introduced below:
- The official order of the difference-in-differences method
- Perfect form output
- New features of Lasso
- New commands for discrete choice models
- New commands for duration data
- Comprehensive upgrade of Bayesian econometrics
- Non-parametric trend test
- New commands for meta-analysis
- Integration of Stata with Python, Java, H2O and Jupyter Notebook
- Improvement of Do file editor and Stata speed improvement, etc.
1. The official order of the double difference method
- “Double Differences” (Difference-in-differences, DID for short) is perhaps the most commonly used measurement method. How can there is no official Stata command for DID? To this end, Stata 17 timely launched the official DID command xtdidregress; among them, “xt” indicates that this is a command suitable for panel data. You can also get Enscape 3.5 for Sketchup By LicesnedSoft.
- In addition to regular DID estimation, the xtdidregress command allows you to specify up to three “group variables” (group variables), or two group variables and a time variable, so as to perform “difference-in-differences-in-differences” (Difference-in-differences-in- differences, abbreviated as DDD).
- In addition, for “repeated cross-sectional data” (repeated cross-sectional data), the so-called “pseudo panel data”, Stata 17 also introduced a related new command didregress, which can perform DID-like estimation. More importantly, you can use DID’s official commands to easily draw parallel trend graphs~
2. Perfect table output
- Empirical researchers often need to export multiple regression results from Stata to Word files in tabular form. Although there is an official command estimates table that can complete such tasks, it is relatively rigid; therefore, Stata users generally used unofficial commands (such as estout or outreg) to output regression results. To this end, Stata 17 has greatly improved the original table command, allowing users to easily report regression results or summary statistics in tabular form.
- Further, you can design the style of the regression table and apply it to the created table, and then output the table to Word or other forms of files (including PDF, HTML, LaTex, Excel, Markdown, etc.). In addition, you can also use the new prefix (prefix) collect to collect various estimated results of Stata commands. Finally, Stata 17 also added the Table Builder (table creator), which allows users to create tables by clicking the mouse (point-and-click).
3. New features of Lasso
- As a common tool for “high-dimensional regression”, Stata 16 has launched a series of official commands about Lasso (Least Absolute Shrinkage and Selection Operator, the so-called “Lasso Estimator”). Stata 17 provides more new features about Lasso.
- Treatment effect models were estimated using Lasso. In Stata 16, the command teffects can used to estimate “treatment effects” (treatment effects) models; and the command lasso is used to estimate high-dimensional models with many covariates. Stata 17 combines the two, and its new command telasso can estimate a treatment effect model that includes many covariates.
- Use BIC to choose Lasso penalty parameters. As a kind of “penalized regression”, when performing Lasso estimation, it is necessary to choose a penalty parameter (penalty parameter). In Stata 16, the penalty parameter can selected using cross-validation, an adaptive method, or a plugin.
- In Stata 17, the option “selection(bic)” has added, and the penalty parameter can selected using the “Bayesian Information Criterion” (abbreviated as BIC). Moreover, the new postestimation command bicplot makes it easy to visualize this selection process.
4. New commands for discrete choice models
- Discrete choice model (discrete choice model) is a commonly used model in microeconometrics. In Stata 17, the following new commands for discrete choice models have added:
- “panel multinomial logit model”. For multinomial logistic models for cross-sectional data, Stata already has the mlogit command. The new xtmlogit command in Stata 17 can use panel data to estimate multiple logistic models. This is undoubtedly a great progress for Stata in the discrete choice model, because before Stata can only use xtlogit or xtprobit to estimate the panel binary choice model.
- “Zero-inflated ordered logit model”. For ordered data, the Stata command ologit or oprobit can used to estimate. In practice, the lowest category is sometimes heavily weighted in sorted data. If the value of the lowest category is recorded as “zero”, there is a so-called “zero inflation” phenomenon. At this time, you can use the new command ziologit of Stata 17 to estimate a more efficient “zero-inflated ordered logit model”.
5. New commands for duration data
- “Duration data” is often used in the “survival analysis” of biostatistics, and it is also widely used in economics, such as the duration of unemployment, the duration of marriage, and the life span of a dynasty. Long-term data often have “censoring” or “merging” problems. For example, when the study ends, some patients may not have died; or some unemployed people may not have found a job.
- Stata 17’s new command stintcox can use the Cox model to estimate a special kind of “interval censored” (interval-censored) data. With interval-censored data, we only know that the event occurred in a certain interval, but not when it occurred; for example, we only know that the cancer recurred in the period between two physical examinations. If the interval censoring problem in the duration data is ignored, it will lead to estimation bias.
6. Comprehensive upgrade of Bayesian econometrics
- In the era of big data, due to the increasingly complex and diverse data, the traditional econometric methods based on the frequency school may inconvenient to use when dealing with some problems, which makes the econometrics of the Bayesian school gradually rise. Frequentists believe that the parameters to estimated are given unknowns (fixed unknown parameters), while Bayesians regard unknown parameters as random variables that obey a certain distribution, and can “a priori” based on new sample information at any time. “prior distribution” was updated to “posterior distribution”. Stata 17 fully upgrades the original Bayesian statistics and econometrics in Stata.
- Bayesian panel-data models. Stata’s existing panel commands include xtreg (static panel), xtlogit or xtprobit (panel binary selection model), and xtologit or xtoprobit (panel sorting model). In Stata 17, if you want to use the Bayesian method to estimate these panel models, just add the “prefix” (prefix) bayes before the original command.
Other Deatails:
- Bayesian VAR models. “Vector Autoregression” (Vector Autoregression, VAR for short) is a common time series model. In the existing Stata, the command var can used to estimate the VAR model, and subsequent commands include: using fcast for “dynamic forecast” (dynamic forecast), and using irf to estimate the “impulse response function” (IRF for short) ) and “forecast error variance decomposition” (forecast error variance decomposition, abbreviated as FEVD).
- In Stata 17, you can use the command “bayes: var” (that is, add the prefix bayes before the command var) to estimate the Bayesian VAR model, and then use bayesfcast for dynamic prediction; and the impulse response function and forecast error variance decomposition are also can obtained similarly.
- There are two major benefits of using a Bayesian approach to estimating a VAR model. First, the VAR model usually contains many parameters, and if the sample is small, the estimation result is unstable. The Bayesian method is more robust when estimating the VAR model with small samples because it is easier to “incorporate prior information”.
Secondly:
- the classic VAR model uses large sample theory for statistical inference and prediction, and it needs to assume that the estimator obeys asymptotically normal distribution, which is not easy to satisfy in small samples. The Bayesian method does not use large sample theory, nor does it require asymptotically normal assumptions, so it is more suitable for small samples.
- Bayesian multilevel models. Stata 17’s new bayesmh command can estimate a series of Bayesian multilevel models, including “univariate” or “multivariate” linear and nonlinear multilevel models (linear and nonlinear multilevel models) , and even panel survival time models (joint longitudinal and survival-time models) and models such as structural equations (SEM-type models), etc.
- Bayesian linear and nonlinear DSGE models (Bayesian linear and nonlinear DSGE models). The “Dynamic Stochastic General Equilibrium” (Dynamic Stochastic General Equilibrium, DSGE for short) model is the mainstream model of macroeconomics. In Stata 16, the commands dsge and dsgenl can used to estimate linear and nonlinear DSGE models, respectively.
- In Stata 17, as long as the prefix bayes is added before the command dsge and dsgenl, the corresponding linear or nonlinear Bayesian DSGE model can estimated. There are more than 30 “prior distributions” available for users to choose, and Bayesian IRF analysis (Bayesian IRF analysis), interval hypothesis testing (interval hypothesis testing), and Bayesian factor ( Bayesian factors) to compare models etc.
7. Non-parametric trend test
- Sometimes there are groups in the sample data (for example, divided into 3 groups). And these groups have a natural order (for example, recorded as 1, 2, 3 groups). Which is the so-called “ordered groups”. In this kind of sorted and grouped data. It is often hoped to test whether there is a certain trend in a certain variable in this grouped sort (for example, group 1-3). Such as the value of this variable tends to become larger and larger. That is The so-called “tests for trend across ordered group”.
- To this end, Stata’s existing command nptrend can used to perform a non-parametric Cuzick test using ranks. The latest version of the nptrend command of Stata 17, in addition to the Cuzick rank test. Added three non-parametric tests, namely “Cochran-Armitage test”, “Jonckheere-Terpstra test” and “linear-by-linear trend test”. So that the function of the command nptrend is greatly enhanced.
8. New commands for meta-analysis
- A “meta-analysis” combines the results of multiple similar studies. For example, for a certain vaccine effectiveness (vaccine efficacy). Multiple experiments have conducted around the world. How to obtain a unified measurement by weighting the vaccine effectiveness index obtained from each experiment. Stata 17 further improves Stata’s meta-analysis capabilities.
- Multivariate meta-analysis. When combining the results of multiple studies. Each of these studies may report “multiple effect sizes” at the same time. And there may correlations between these effects. Stata 17’s new command meta mvregress performs multidimensional meta-analysis and handles this correlation.
- Galbraith plots. Stata 17 also added the command meta galbraithplot, which can draw “Galbraith plots” for meta-analysis. This plot can used to assess heterogeneity of the studies and to spot potential outliers.
- Leave-one-out meta analysis. Stata 17 has added the function of “Leave-one-out meta-analysis”. The so-called “leaving one meta-analysis” means that when conducting meta-analysis. For example, whether the final result is overly dependent on a certain study. When using the Stata command meta summarize or meta forestplot for meta-analysis. You can use the new option leaveoneout to perform leave-one-out meta-analysis.
9. Integration of Stata with Python, Java, H2O and Jupyter Notebook
- In the era of big data, Stata is also accelerating its integration with mainstream software platforms to provide users with more value-added services. This is particularly prominent in this upgrade of Stata 17.
- Integration with Python (Python integration). Python is already a hot mainstream computer language. To this end, Stata 16 specifically provides an interface with Python. Allowing users to call Python under the familiar Stata interface and display the running results in Stata. Stata 17 goes a step further and introduces a new Python package (Python package) pystata. Which allows users to easily call Stata in Python. Stata 17 also introduces a new concept “PyStata”, which includes all the ways Stata interacts with Python.
- Integration with Java (Java integration). In Stata 17, you can easily embed and execute Java code in Stata programs.
- Support for JDBC data exchange format (Support for JDBC). JDBC (Java Database Connectivity) is a cross-platform standard for exchanging data between programs and databases. In Stata 17, support for JDBC enables Stata users to import data from some of the most popular databases. Including Oracle, MySQL, Amazon RedShift, Snowflake, Microsoft SQL Server, and more.
- Integration with H2O (H2O integration). H2O is a popular machine learning software platform. In Stata 17, you can connect to and call H2O’s machine learning algorithms. This undoubtedly opens another window to machine learning for Stata users!
10. The improvement of the Do file editor and the speed improvement of Stata, etc.
- Do-file Editor improvements. With the increasing importance of programming. Stata 16 added the functions of “autocompletion” and “syntax highlighting” to the Do file editor. Stata 17 has further improved the function of the Do file editor.
- In Stata 17’s Do file editor. You can quickly jump to the part you want to edit in a long do file by setting “bookmarks”. Stata 17’s Do file editor also adds a new “navigation control” (navigation). Which lists all bookmarks and their labels (bookmarks and their labels), as well as all “programs” in the Do file.
- Stata’s speed improvement (Faster Stata). In the era of big data, the speed of the underlying algorithms is becoming more and more important. To this end, Stata 17 has updated the algorithms of the commands sort and collapse to make them faster. In addition, Stata 17 also improves the speed of the command mixed (used to estimate multilevel mixed-effects models. Ie multilevel mixed-effects models).
- Speed up with Intel Math Kernel Library (MKL). Stata 17 introduces the Intel Math Kernel Library (MKL), which is suitable for all Intel or AMD 64-bit computers. So that the deeply optimized (deeply optimized) LAPACK (Linear Algebra PACKage) linear algebra package can called. This will further increase the underlying computing speed of Stata and Mata, and Stata users can enjoy it without doing anything.