crime reporting research papers

Systematic Review
Open access
Published: 27 May 2020

A systematic review on spatial crime forecasting

Ourania Kounadi 1 ,
Alina Ristea ORCID: orcid.org/0000-0003-2682-1416 2 , 3 ,
Adelson Araujo Jr. 4 &
Michael Leitner 2 , 5

Crime Science volume 9 , Article number: 7 ( 2020 ) Cite this article

16k Accesses

40 Citations

9 Altmetric

Metrics details

Predictive policing and crime analytics with a spatiotemporal focus get increasing attention among a variety of scientific communities and are already being implemented as effective policing tools. The goal of this paper is to provide an overview and evaluation of the state of the art in spatial crime forecasting focusing on study design and technical aspects.

We follow the PRISMA guidelines for reporting this systematic literature review and we analyse 32 papers from 2000 to 2018 that were selected from 786 papers that entered the screening phase and a total of 193 papers that went through the eligibility phase. The eligibility phase included several criteria that were grouped into: (a) the publication type, (b) relevance to research scope, and (c) study characteristics.

The most predominant type of forecasting inference is the hotspots (i.e. binary classification) method. Traditional machine learning methods were mostly used, but also kernel density estimation based approaches, and less frequently point process and deep learning approaches. The top measures of evaluation performance are the Prediction Accuracy, followed by the Prediction Accuracy Index, and the F1-Score. Finally, the most common validation approach was the train-test split while other approaches include the cross-validation, the leave one out, and the rolling horizon.

Limitations

Current studies often lack a clear reporting of study experiments, feature engineering procedures, and are using inconsistent terminology to address similar problems.

Conclusions

There is a remarkable growth in spatial crime forecasting studies as a result of interdisciplinary technical work done by scholars of various backgrounds. These studies address the societal need to understand and combat crime as well as the law enforcement interest in almost real-time prediction.

Implications

Although we identified several opportunities and strengths there are also some weaknesses and threats for which we provide suggestions. Future studies should not neglect the juxtaposition of (existing) algorithms, of which the number is constantly increasing (we enlisted 66). To allow comparison and reproducibility of studies we outline the need for a protocol or standardization of spatial forecasting approaches and suggest the reporting of a study’s key data items.

Environmental criminology provides an important theoretical foundation for exploring and understanding spatial crime distribution (Bruinsma and Johnson 2018 ). The occurrence of crime within an area fluctuates from place to place. Besides, crime occurrences depend on a multitude of factors, and they show an increased strategic complexity and interaction with other networks, such as institutional or community-based. In criminology research, these factors are primarily referred to as crime attractors and crime generators (Kinney et al. 2008 ). Spatial fluctuations and dependencies to attractors and generators suggest that crime is not random in time and in space. A strong foundation for spatial predictive crime analytics is the Crime Pattern Theory (Brantingham and Brantingham 1984 ). It is used to explain why crimes occur in specific areas, suggests that crime is not random, and that it can be organized or opportunistic. In particular, it shows that when the activity space of a victim intersects with the activity space of an offender, there are higher chances for a crime occurrence. The activity perimeter of a person is spatially constrained by locations that are attended (nodes). For example, if one of the personal nodes is in a high-crime neighbourhood, criminals come across new opportunities to offend.

If crime is not random it can be studied, and as such, its patterns, including the spatial component, can be modelled. As a consequence, environmental criminology theories have been tested scientifically and in the past decade various research fields have made much progress in developing methods for (spatiotemporal) crime prediction and evaluation (Caplan et al. 2011 ; Mohler et al. 2011 , 2015 ; Perry 2013 ; Wang and Brown 2011 ; Yu et al. 2011 ).

Most prediction techniques are used for retrospective forecasting, i.e., predicting the future through historical data. Historical crime data are used alone or together with crime attractors and generators (which can be demographic, environmental, etc.) in diverse types of prediction models (Mohler et al. 2011 ; Ohyama and Amemiya 2018 ; Yu et al. 2011 ). Apart from static data, such as demographics or socio-economic variables, as predictors, researchers have recently included dynamic space and time features, thus giving a boost to predicting crime occurrences. These models consist of social media data (Al Boni and Gerber 2016 ; Gerber 2014 ; Kadar et al. 2017 ; Wang et al. 2012 ; Williams and Burnap 2015 ), and taxi pick-up and drop-off data (Kadar and Pletikosa 2018 ; Wang et al. 2016 ; Zhao and Tang 2017 ).

Although current crime prediction models show increasing accuracy, little emphasis has been placed on drawing the empirical and technical landscape to outline strengths and opportunities for future research, but also to identify weaknesses and threats. In this paper, we focus on spatial crime forecasting, which is the spatial forecasting of crime-related information. It has many applications such as the spatial forecast of the number of crimes, the type of criminal activity, the next location of a crime in a series, or other crime-related information. At this point, we should note that we came across papers that claim to do spatial crime forecasting or crime forecasting while extrapolating in space or detecting spatial clusters. Overall, papers in the field of spatial crime analysis use the term prediction synonymous with forecasting and they have a preference for the term prediction (Perry 2013 ). However, there are several spatial prediction types with applications of interpolation or extrapolation. Forecasting is a prediction that extrapolates an estimated variable into a future time. While prediction can be synonymous with forecasting, it is often also used to infer unknown values regardless of the time dimension (e.g., predict the crime in area A using a model derived from area B). Cressie ( 1993 , pp 105–106) refers to spatial prediction as an inference process to predict values at unknown locations from data observed at known locations. His terminology includes the temporal notions of smoothing or interpolation, filtering, and prediction, which traditionally use time units instead of locations. As a result, when searching for forecasting literature you need to add the “prediction” term, which derives a much larger pool of papers, than the ones that actually do “only” forecasting. In this paper, we define the term “Spatial Crime Forecasting” as an inference approach about crime both in time and in space. In the box below, we add definition boundaries by describing variations of forecasting approaches that we consider in our study.

We are driven by the need to harmonize existing concepts and methodologies within and between criminology, sociology, geography, and computer science communities. The goal of this paper is to conduct a systematic literature review in spatial crime predictive analytics, with a focus on crime forecasting, to understand and evaluate the state of the art concerning concepts and methods given the unprecedented pace of published empirical studies. Below, we list the research questions of this study.

What are the types of forecasted information for which space plays a significant role? (“ Overview of selected publications on spatial crime forecasting ” section).

What are the commonly used forecasting methods? (“ Spatial crime forecasting methods ” section).

Which are the technical similarities and differences between spatial crime forecasting models? (“ Spatial crime forecasting methods ” section).

How is predictive performance being measured in spatial crime forecasting? (“ Considerations when analysing forecasting performance ” section).

What are the commonly used model validation strategies? (“ Considerations when analysing forecasting performance ” section).

What are the main dependencies and limitations of crime forecasting performance? (“ Considerations when analysing forecasting performance ” section).

Before presenting the results (“ Results ” section) and discuss them in the form of a SWOT analysis (“ Discussion ” section), we summarize previous literature work on crime prediction and analytics (“ Related work ” section) and then present the methodology to select the papers and ensure the study quality (“ Methods ” section). Last, in “ Conclusion ” section we conclude with the main findings of each research question. With our work, we aim to shed light on future research directions and indicate pitfalls to consider when performing spatial crime forecasting.

Related work

The papers identified as review or related-work studies (a total of 13) date back to 2003 and are connected to the keyword strategy that we used (find further details in “ Study selection ” section). In addition, to review papers (a total of 9), we also include two editorials, one book chapter, and one research paper, because they contain an extensive literature review in the field of crime predictive analytics.

Five papers focus on data mining with a much broader scope than our topics of interest, i.e., prediction, forecasting, or spatial analysis. The oldest one proposes a framework for crime data mining (Chen et al. 2004 ). It groups mining techniques into eight categories, including (a) the entity extraction (usage example: to identify persons), (b) clustering (usage example: to distinguish among groups belonging to different gangs), (c) association rule mining (usage example: to detect network attacks), (d) sequential pattern mining (usage example: same as before), (e) deviation detection (usage example: to identify fraud), (f) classification (usage example: to identify e-mail spamming), (g) string comparator (usage example: to detect deceptive information), and (h) social network analysis (usage example: to construct the criminal’s role in a network). Association rule, clustering, and classification are the ones that have been discussed in other crime data mining reviews, such as for the identification of criminals (i.e., profiling) (Chauhan and Sehgal 2017 ), applications to solve crimes (Thongsatapornwatana 2016 ), and applications of criminal career analysis, investigative profiling, and pattern analysis (with respect to criminal behaviour) (Thongtae and Srisuk 2008 ). Furthermore, Hassani et al. ( 2016 ) conducted a recent in-depth review that looked at over 100 applications of crime data mining. Their taxonomy of applications identifies five types that include those previously described by Chen et al. ( 2004 ) with the exemption of sequential pattern mining, deviation detection, and string comparator. Regarding specific algorithms, the emphasis is put on three types, namely decision trees, neural networks, and support vector machines. Chen et al. ( 2004 ) work covers a broad spectrum of crime analysis and investigation and as such, it identifies a couple of studies related to hotspot detection and forecasting under the mining categories of clustering and classification. These technical review studies gave us examples of the data items that we need to extract and analyse, such as the techniques that are used and the tasks that are performed (Thongsatapornwatana 2016 ) as well as the study purpose and region (Hassani et al. 2016 ).

The oldest, yet still relevant paper to our work is an editorial to six crime forecasting studies (Gorr and Harries 2003 ). The authors refer to crime forecasting as a new application domain, which includes the use of geographical information systems (GIS), performs long- and short-term prediction with univariate and multivariate methods, and fixed boundary versus ad hoc areal units for space and time-series data. More than 15 years later, this application domain is not new but it still involves the same characteristics as described above. Another editorial by Kennedy and Dugato ( 2018 ) introduces a special issue on spatial crime forecasting using the Risk Terrain Modelling (RTM) approach. The focus of most papers is to analyse factors that lead to accurate forecasts because the foundation of the RTM approach is based on the Theory of Risky Places by Kennedy and Caplan ( 2012 ). This theory starts with the proposition that places vary in terms of risk due to the spatial influence of criminogenic factors. Last, a recent review study summarizes past crime forecasting studies of four methods, namely support vector machines, artificial neural networks, fuzzy theory, and multivariate time series (Shamsuddin et al. 2017 ). The authors suggest that researchers propose hybrid methods to produce better results. In our study we group and discuss a much wider number of methods (a list of 66 in Additional file 1 C) and we also identified hybrid approaches (e.g., ensemble methods) one of which dates back to 2011.

In addition, we identified two papers that describe spatial methods for spatial crime prediction per se. The paper by Bernasco and Elffers ( 2010 ) discusses statistical and spatial methods to analyse crime. They interestingly distinguish two types of spatial outcomes for modelling, including spatial distribution and movement. When it comes to spatial distribution, which is relevant to the scope of our paper, the authors describe the following spatial methods, including spatial regression models, spatial filtering, geographically weighted regression, and multilevel regression with spatial dependence. The paper by Chainey et al. ( 2008 ) focuses on hotspot mapping as a basic approach to crime prediction. The techniques they describe and empirically examine are spatial ellipses, thematic mapping of geographic areas, grid thematic mapping, and Kernel Density Estimation (KDE). Among these, KDE yielded the highest prediction accuracy index (PAI) score. Surprisingly, most of the spatial methods (with the exemption of KDE and RTM) have not been used by authors of our selected papers (see methods discussed in “ Spatial crime forecasting methods ” section).

Regarding predictive policing, a recent review explains its definition, how it works, how to evaluate its effectiveness, and it also provides an overview of existing (mostly commercial) applications (Hardyns and Rummens 2018 ). One of the innovative aspects of this review is the section on the evaluation of predictive policing using three criteria, namely the correctness of the prediction, the effect of predictive policing implementations to actual crime rates, and the costs relative to the methods being replaced. The authors of this paper support the definition of predictive policing that originates from Ratcliffe ( 2015 , p. 4), which reads: “ the use of historical data to create a spatiotemporal forecast of areas of criminality or crime hot spots that will be the basis for police resource allocation decisions with the expectation that having officers at the proposed place and time will deter or detect criminal activity ”. In general, spatial crime forecasting has a broader scope and is not synonymous to predictive policing. In addition, the papers that we examine do not aim in assisting policing decisions (although this can be an indirect consequence) but they have an academic and explanatory focus. However, the effectiveness of the predictive analysis- first criterion, as framed by Hardyns and Rummens ( 2018 ), is highly connected to our scope and thus is further analysed, from a technical perspective, in “ Considerations when analysing forecasting performance ” section.

A second predictive policing systematic review by Seele ( 2017 ) examines the potential of big data to promote sustainability and reduce harm and also discusses ethical and legal aspects linked to predictive algorithms. Similarly, Ozkan ( 2018 ) also reviews big data for crime research. This paper provides a critical discussion on the benefits and limitations of data-driven research and draws attention to the imminent threat of eliminating conventional hypothesis testing, which has traditionally been an integral scientific tool for social scientists and criminologists.

Except for Seele ( 2017 ) no other related-work study follows a systematic procedure regarding the methods to identify and select relevant research, and thereafter to collect and analyse data from them. Also, our work focuses only on spatial crime forecasting, which is narrower than crime data mining and broader than predictive policing as discussed above. Last, we aim to contribute with scientific reference for technical issues in future studies. To achieve this, we follow a review protocol (“ Methods ” section), to answer six research questions (mentioned in “ Background ”) that have not been systematically addressed by previous studies.

Study selection

This study follows the reporting guidance “PRISMA” (Preferred Reporting Items for Systematic reviews and Meta-Analyses) (Liberati et al. 2009 ). PRISMA suggests a checklist of 27 items regarding the sections of a systematic literature review and their content, as well as a four-phase flow diagram for the selection of papers. We adopted and modified the PRISMA guidance according to the needs of our study. Our flow diagram contains three phases for the selection of papers. The first phase is “identification” and involves the selection of information sources and a search strategy that yields a set of possible papers. The second phase is “screening” the selected papers from the first phase, and removing the ones that are not relevant to the research scope. The third phase is “eligibility”, in which we applied a more thorough reading of papers and selected the ones that are relevant to our research questions. The count of papers in each phase and their subsequent steps are illustrated in Fig. 1 .

The three phases of the study selection process: identification, screening, and eligibility

The number of papers selected in the Identification phase is based on eleven keywords related to crime prediction (i.e., predict crime, crime predictive, predictive policing, predicting crimes, crime prediction, crime forecasting, crime data mining, crime mining, crime estimation, crime machine learning, crime big data). In addition, we added seven more spatially explicit terms (i.e., crime hotspot, spatial crime prediction, crime risk terrain modelling, spatial crime analysis, spatio-temporal modelling crime, spatiotemporal modelling crime, near-repeat crime). In a subsequent analysis, we have visualized the word frequency of the titles of the selected papers as evidence of the relevance of the keywords used. This can be found in Additional file 1 B: A word cloud of high - frequency words extracted from the titles of the selected papers .

Next, we selected information sources to perform literature searches. Although there is a remarkable number of search engines and academic databases, we focus on scholarly and comprehensive research databases including fields where spatial crime prediction is a representative topic. We considered the following databases, including Web of Science by Clarivate Analytics (WoS), Institute of Electrical and Electronics Engineers (IEEE) Xplore, ScienceDirect by Elsevier (SD), and Association for Computing Machinery (ACM) Digital Library. We consider that an optimal search process should include multiple academic search databases, with searches being carried out at the best level of detail possible. In addition, as also discussed by Bramer et al. ( 2017 ) in an exploratory study for database combinations, if the research question is more interdisciplinary, a broader science database such as Web of Science is likely to add value. With regards to Google Scholar (GS) there are divergent opinions between researchers if GS brings relevant information for an interdisciplinary review or not. Holone ( 2016 ) discusses that some engine searches, specifically GS, have a tendency to selectively expose information by using algorithms that personalize information for the users, calling this the filter bubble effect. Haddaway et al. ( 2015 ) found that when searched for specific papers, the majority of the literature identified using Web of Science was also found using GS. However, their findings showed moderate to poor overlap in results when similar search strings were used in Web of Science and GS (10–67%), and that GS missed some important literature in five of six case studies.

In each database, we used keywords on singular and plural word versions (e.g., crime hotspot/s). For WoS, we used the advanced search option, by looking for papers written in English and matching our keywords with the topic or title. For IEEE, we searched for our keywords in the metadata or papers’ titles. In SD and ACM, we used the advanced search option with Boolean functions that searched our keywords in the title, abstract, or paper’s keywords. The identified papers were integrated directly into the free reference manager Mendeley. Last, we removed duplicates within each database, which resulted in 786 papers for the second phase, the Screening phase. The last search in the Identification phase was run on 5 February 2019.

Whereas, the use of statistical and geostatistical analyses for crime forecasting has been considered for quite some time, during the last two decades there has been an increasing interest in developing tools that use large data sets to make crime predictions (Perry 2013 ). Thus, predictive analytics have been included in law enforcement practices (Brayne 2017 ). This is the main reason that during the Screening phase, we first excluded papers published before 2000. Second, we removed duplicates across the four selected databases (WoS, IEEE, SD, and ACM). Third, we screened all papers to identify the “non-relevant” ones. This decision was made by defining “relevant” papers to contain the following three elements. The first element is that a paper addresses crime events with explicit geographic boundaries. Common examples of excluded papers are the ones dealing with the fear of crime, offenders’ characteristics, offender, victims’ characteristics, geographical profiling, journey to crime, and cyber or financial crime. The second element for a paper to be “relevant” is that it employs a forecasting algorithm and is not limited to exploratory or clustering analysis. The third element is that there is some form of spatial prediction. This means that there are predefined spatial units of analysis, such as inferencing for each census block of the study area. For the relevance elements, our strategy was the following: (a) read title and screen figures and/or maps, (b) if unsure about relevance, read abstract, (c) if still unsure about relevance, search for relevant words (e.g., geo*, location, spatial) within the document. The last step of the Screening phase was to remove relevant papers that authors did not have access to, due to subscription restrictions. The Screening phase resulted in 193 relevant papers to be considered for the third and final phase.

During this final phase, the Eligibility phase, we read the abstract and main body of all 193 papers (e.g., study area, data, methods, and results). The focus was to extract data items that compose the paper’s eligibility criteria. These are grouped into three categories, namely: (a) publication type which is the first data item, (b) relevance: consists of data items relevance and purpose , and (c) study characteristics: consists of data items study area , sampling period , empirical data , evaluation metrics . Next, we discuss each category and the data items it entails.

The first data item is the publication type. Literature reviews sometimes exclude conference papers because their quality is not evaluated like International Scientific Indexing (ISI) papers. However, for some disciplines, such as computer science, many conferences are considered as highly reputable publication outlets. In the Screening phase, we found a large number of papers from computer or information science scholars, hence at this stage we decided not to exclude conference papers (n = 65), but also non-ISI papers (n = 19). In total, we excluded nine papers that are book chapters or belong to other categories (e.g., editorial).

The next two “relevance” criteria (i.e., relevance and purpose) address the fit of the papers’ content to our research scope. Paper relevance was checked again during this phase. For example, some papers that appeared to be relevant in the Screening phase (i.e., a paper is about crime events, space, and forecasting), were actually found not to be relevant after reading the core part of the paper. For example, prediction was mentioned in the abstract, but what the authors implied was that prediction is a future research perspective of the analysis that was actually done in the paper. Also, we added the data item “purpose” to separate methods that model and explore relationships between the dependent and independent variables (e.g., crime attractors to burglaries) from the ones that perform a spatial forecast. The number of papers that were excluded due to these criteria amounted to 81.

Last, there are four more “study characteristics” criteria relevant to the quality and homogeneity of the selected papers. First, the study area should be equal to or greater than a city. Cities are less prone to edge effects compared to smaller administrative units within a city that share boundaries with other units (e.g., districts). In addition, the smaller the study area the more likely it is that conclusions are tailored to the study characteristics and are not scalable. Second, the timeframe of the crime sample should be equal or greater than a year to ensure that seasonality patterns were captured. These two items also increase the homogeneity of the selected studies. Yet, there are significant differences among studies that are discussed further in Results section. The last two criteria are the restriction to analysing empirical data (e.g., proof of concepts or purely methodological papers were excluded) and to use measures that evaluate the models’ performance (e.g., mean square error). The last two criteria ensure that we only analyse studies that are useful to address our research questions. The number of papers that were excluded due to the publication type, the relevance, or the study characteristics were 71. Furthermore, Fig. 1 shows the number of excluded papers for each data item (e.g., 17 papers were excluded due to insufficient size of the study area). Finally, the entire selection processes yielded 32 papers.

Study quality

Two of the four authors of this research performed the selection of the papers to be analysed. Prior to each phase, these two authors discussed and designed the process, tested samples, and divided the workload. Then, results were merged, analysed, and discussed until both authors reached a consensus for the next phase. The same two authors crosschecked several of the results to ensure methodological consistency among them. The reading of the papers during the final phase (i.e., eligibility) was performed two times, by alternating the papers’ samples among the two authors, to ensure all eligible papers were included. In addition, in case some information on the paper’s content was unclear to the two authors, they contacted via email the corresponding authors for clarifications.

Regarding the results subsections that constitute four study stages (“ Study characteristics ”, “ Overview of selected publications on spatial crime forecasting ”, “ Spatial crime forecasting methods ”, and “ Considerations when analysing forecasting performance ” sections), one or two authors performed each and all authors contributed to extracting information and reviewing them. To extract information that is structured as data items we followed a procedure of three steps that was repeated at each stage. First, the papers were read by the authors to extract manually the data items and their values (1—extract). Data items and their values were then discussed and double-checked by the authors (2—discussion/consensus). In case information was still unclear, we contacted via email the corresponding authors for clarifications (3—consultation). This information was structured as a matrix where rows represent the papers and columns are several items of processed information (e.g., a data item is the year of publication). Table 1 shows the data items at the stage at which they were exploited. The attributes (values) of the items are discussed in “ Results ” section.

The risk of bias in individual studies was assessed via the scale of the study. Spatial and temporal constraints were set (already defined in the eligibility phase) to ensure that we analyse medium to large scale studies and that findings are not tied to specific locality or seasonality characteristics. Furthermore, we did not identify duplicate publications (i.e., two or more papers with the same samples and experiments) and did not identify study peculiarities, such as special and uncommon features or research topics.

Last, the risk of bias across studies was assessed via an online survey. We contacted the authors of the publications (in some cases we could not identify contact details) and ask them to respond to a short survey regarding the results of their paper. The introductory email defined the bias across studies as “ Bias across studies may result from non - publication of full studies (publication bias) and selective publication of results (i.e., selective reporting within studies) and is an important risk of bias to a systematic review and meta - analysis” . Then, we explained the content of the survey that is to identify, if there are non-reported results that are considerably different from the ones in their papers. This information was assessed via two questions (for further details we added the questionnaire as a Additional file 1 of this paper). Out of the 32 papers, we received responses for 11 papers ( n = 12, with two authors responding to the same paper). One factor that explains the low response rate is that many authors have changed positions (papers date back to 2001) and for some we could not identify their new contact details, while for others we received several non-delivery email responses.

Regarding the responses’ results, seven authors responded that they never conducted a similar study to the one for which they were contacted for and five responded that they have conducted a similar study to the one for which they were contacted. A similar study was defined as a study in which: (a) the study design, selection of independent variables/predictors, selection of method(s), and parametrization of a method(s) are the same, and (b) data can be different. From those who performed a similar study four responded that their results were not different and one responded that their results were considerably different. However, in a follow-up explanatory answer, this author responded that changing the parametrization yielded different results regarding the performance ranking of three algorithms and that the data and the study area were the same. Based on this small-scale survey there is no indication that there is a risk of bias across studies. However, further investigation of this matter is needed.

Study characteristics

In this section, we discuss generic characteristics of the selected papers. To start with, the type of publication is slightly higher for ISI journal articles (n = 18) than for conference papers (n = 14). The 32 papers were published in a variety of journals and conferences and no preference was observed for a particular publication outlet. In specific, four journals and one conference published two or three of the selected papers each (Table 2 ) and all other papers were published in different journals and conferences. On the other hand, there is little variation regarding the country of the study area. The majority of studies were conducted in the US, which is probably a somewhat biased statistic, considering the large US population size, as well as the used language (e.g., English) of the study selection process. Similarly, institutions that have published more than one paper on spatial crime forecasting are based in the US with the exception of the Federal University of Rio Grande do Norte, Brazil, that has recent publications in this field.

We also collected the discipline associated with each paper. To do so we used the affiliation of the first author and extracted the discipline down to the department level, if this was possible. Then we used as a benchmark reference the 236 categories/disciplines used in Journal Citation Reports (JCR) Footnote 1 by the Web of Science Group. Each affiliation of authors was then matched to one of the categories. In Table 2 , we see the disciplines that appeared more than one time (i.e., computer science, criminology, public administration, and geosciences). Although we collected a variety of disciplines these are the ones that we encountered more than once and account for the majority of the papers ( n = 22). Thus scholars of these disciplines seem to have a greater interest in spatial crime forecasting.

Figure 2 shows the number of eligible and selected articles by year during the study selection period. We included the eligible in addition to the selected papers for two reasons. First, many of the eligible papers looked into spatial crime forecasting but did not meet the criteria defined for this study. Second, other papers may not be relevant to forecasting, but are relevant to the broader topics of prediction or modelling. The graph in Fig. 2 depicts a rapidly increasing trend over the last couple of years. For the eligible papers, the number of articles increased substantially since 2013, whereas for the selected papers, a similar trend is evident in the last 2 years.

A yearly count of eligible and selected papers from 2001 to 2018

Overview of selected publications on spatial crime forecasting

In Table 3 we enlist each selected paper along with information related to space (i.e., study area and spatial scale), time (i.e., sampling period and period in months), crime data (i.e., crime type and crime sample size), and prediction (i.e., predicted information, task, spatial unit, and temporal unit). In this section, we consider these 10 data items as initial and basic information when reporting a spatial crime forecasting study. A reader who may want to replicate or modify the methodological approach presented in the follow-up research will require the same 10 data items to assess whether such approach is adequate to the author’s follow-up study and research objectives. More importantly, when any of these data items are missing an assessment of the generalizability (or bias) of the conclusions and interpretation of results is limited. Unfortunately, the majority of the 32 selected papers (n = 21) had at least one item with undefined or unclear information for five out of the 10 data items (Fig. 3 ). From these, 52% (n = 11) were conference papers and 48% (n = 10) were ISI articles. On the other hand, 73% (n = 8) of the papers with no undefined or no unclear information were ISI papers and 27% (n = 3) were conference papers.

Percentages of all publications (n = 32) for describing basic information when reporting a spatial crime forecasting study. Blue: the item was properly defined; orange: the item was poorly defined or undefined

Most of the studies were conducted at the city level. In two studies, the forecasting area covered a county, which is the US administrative unit that usually expands across a city’s boundary. In one paper, predictions covered an entire country (US). New York City (US) was examined in four studies, Pittsburgh (US) was examined in three studies, and Portland (US), Natal (Brazil), and Chicago (US), were examined in two studies. All other publications were based on individual study areas.

The oldest crime data that were used in the 32 papers are from the year 1960 and the most recent crime data are from 2018. The sampling period ranges from 1 year up to 50 years. There is one paper with an undefined sampling period. However, from the article it can be inferred that the length of the sampling period is at least 1 year. Regarding the sample size of the crime data, it ranges from about 1400 incidents up to 6.6 million, which is relevant to the number of crime types as well as to the length of the sampling period. As for the number of crime types, there are four studies that aggregated and analysed all crime types together, twelve studies that focused on a particular crime type, fourteen studies that looked from two to up to 34 different crime types, and three studies with undefined information on the crime type analysed. Residential burglary was the crime type that was most often examined in studies that looked into only one crime type.

The last four data items in Table 3 describe details of the forecasted information, which we refer to as “inference”. The temporal unit is the time granularity of the inference and ranges from a fine granularity of 3 h up to 1 year. The most frequent temporal unit across all papers is 1 month (used in 12 papers). In addition, day and week have been used in eight studies and years in seven studies. Other less frequent temporal units are 3 h, daytime for 1 month, night-time for 1 month, 2 weeks, 2 months, 3 months, and 6 months. Similarly, the spatial unit is the space granularity of the inference and ranges from a small area of 75 m × 75 m grid cell size to a large area, such as police precincts or even countries. The spatial unit is difficult to analyse and to compare for two reasons. First, spatial units do not have a standard format like time and are two-dimensional. Thus, they can vary in size and shape. Second, for about one-third of the papers this information was poorly reported (Fig. 3 ). In the case of administrative units (e.g., census blocks or districts), the shape and size usually vary, but if someone is looking for further details or the units themselves, these can be in most cases retrieved by administrative authorities. However, spatial clusters may also vary in shape and size, but if details are not reported (e.g., the direction of ellipses, the range of clusters’ size, the average size of clusters) it is difficult to quantify and replicate such clusters. We also encountered cases where authors report dimensions of a grid cell size without mentioning the units of measurement. Nevertheless, the grid cell seems to be the preferable type of spatial unit and it is used in the majority of papers (n = 20).

The data items “inference” and “task” refer to the types of forecasted information and predictive task, respectively. Inference and task are defined according to the information that the authors evaluated and not to the output of a prediction algorithm. For example, an algorithm may derive crime intensity in space (i.e. the algorithm’s output), which the authors used to extract hotspots (i.e. processed output to be evaluated) from and then evaluate their results using a classification measure such as precision, accuracy, or others. Some predictive methods, such as random forest, can be used for both classification and regression tasks. It is unclear why some authors choose to apply a regression application of a method and then process, derive, and evaluate a classification output, although they could do this by directly applying a classification application of the same method. In addition, the inference “hotspots” in Table 3 includes the next four categories:

Concerning categories three and four, some authors refer to these areas as hotspots and others do not. We group all four categories together and define them as hotspots and non-hotspots, representing the output of a binary classification that separates space into areas for police patrolling that are alarming and non-alarming. We acknowledge that in the field of spatial crime analysis, hotspots are areas of high crime intensity. However, in our selected papers there does not seem to be a clear definition of the term “hotspot”.

The majority of the papers (n = 20) inferred hotspots as the outcome of a binary classification. Nine studies inferred the number of crimes or the crime rate in each spatial unit. However, three studies appear to be somehow different and unique from the rest. Huang et al. ( 2018 ) evaluated the forecasted category of crime as the output of a binary classification problem (e.g., is category A present in area X; yes or no). Ivaha et al. ( 2007 ) inferred the total number of crimes in a region, spatial clusters (or hotspots), and the share of crime within each cluster. Last, Rodríguez et al. ( 2017 ) evaluated the properties (i.e., location and size) of inferred clusters.

Spatial crime forecasting methods

The first three data items that were extracted to be analysed in this section are the proposed forecasting method, best proposed forecasting method , and the baseline forecasting method . The latter is the method used as a comparison measure of the proposed method. We analysed the frequency of the methods for each of the three forecasting types. The best proposed forecasting method is the one with the best performance throughout the conducted experiments. For example, if an experiment is evaluated separately on multiple types of crimes, we only count the method with the best performance for most cases. In case two methods perform similarly (as evidenced by statistical results and reported by the authors of the papers), both methods are considered. This was necessary because some papers proposed more than one method to be compared with a baseline method, but in the end, these papers propose the method with the best performance. In addition, this reduces biased frequency counts of proposed methods. On the other hand, we considered as a baseline the method, with which the authors wanted to compare the proposed methods. For instance, Zhuang et al. (2018) proposed three Deep Neural Networks and used an additional six machine learning algorithms as baseline methods to assess how much better the proposed methods were compared to the baseline methods. In this case, we counted the six machine learning algorithms as the baseline methods.

In Table 4 , we show “top” methods (i.e., frequently counted within the 32 selected papers) by each item. Random Forest (RF) is the most frequently used proposed method. Multilayer Perceptron (MLP) appears as a top method in all three items (i.e., proposed, best, baseline). Other best proposed methods are Kernel Density Estimation (KDE)-based and Risk Terrain Modelling (RTM). Interestingly, Support Vector Machines (SVM) have been proposed in five papers, but are not among the top four best-proposed methods. On the other hand, plenty well-known statistical models, are preferred as baseline methods, such as Autoregressive model (AR)-based, Logistic Regression, Autoregressive Integrated Moving Average model (ARIMA), and Linear Regression, as well as KDE-based and K Nearest Neighbours. In Additional file 1 C we added detailed tables that show for each paper the data items proposed method, best proposed method, and baseline method.

In the next sections, we categorize the proposed forecasting methods by type of algorithm (“ Algorithm type of proposed forecasting methods ” section) and by type of inputs they take (“ Proposed method input ” section). This task was challenging because there is no scientific consensus on a single taxonomy or categorization of analytical techniques and methods. Papamitsiou and Economides ( 2014 ) reviewed papers in educational analytics, categorizing data mining methods into classification, clustering, regression, text mining, association rule mining, social network analysis, “discovery with models”, visualization, and statistics. Other researchers would summarize all of these categories, for instance, as supervised learning, non-supervised learning, and exploratory data analysis. Vlahogianni et al. ( 2014 ) use different categorizations for reviewed papers in traffic forecasting, including aspects related to the model’s approach to treating inputs and other properties relevant to split the proposed methodologies. The right granularity of properties to define a useful categorization can be problematic and particular for each field.

Algorithm type of proposed forecasting methods

Another suitable characteristic to classify forecasting methods is the similarities between algorithms. We divide all algorithms used in the reported papers into (i) kernel-based (ii) point process, (iii) traditional machine learning, and (iv) deep learning, according to the following criteria. Kernel-based algorithms are particularly concerned with finding a curve of crime rate $\lambda$ for each place $g$ that fits a subset of data points within the boundaries of a given kernel (see Eq. 1 ). We observe that the main difference among kernel-based algorithms is the use of different kernel types. Hart and Zandbergen ( 2014 ) experimented with different kernel types, providing some useful conclusions. In our selected papers, six of them have used kernel-based algorithms with the most frequently used the simple two-dimensional Kernel Density Estimation (KDE) (n = 2). However, we observed that some methods are a variation from the simple KDE model, in the form of the Spatio-Temporal KDE (STKDE) used in the paper by Hu et al. ( 2018 ), the Network-Time KDE (NTKDE) proposed by Rosser et al. ( 2017 ), or the dynamic-covariance KDE (DCKDE) proposed by Malik et al. ( 2014 ). We also have considered the Exponential Smoothing method used in the paper of Gorr et al. ( 2003 ) as a kernel-based algorithm, since it works with a window function (kernel) on time series aggregation.

Point processes can be distinguished from kernel-based algorithms insofar as a background rate factor $\mu$ that can be calculated stochastically, such as with a Poisson process, is considered. The background rate factor includes the modelling of covariates or features of the place $g$ , such as demographic, economical, geographical, etc. variables (see Eq. 2 ). From the explanation made by Mohler ( 2014 ), we suppose that the introduction of the background rate makes the point process more suitable for multivariate modelling when compared to kernel-based algorithms. In the reviewed papers, algorithms can be distinguished among each other based on their mathematical formulations of $\kappa$ and $\mu$ , but also on their internal parameter selection, mostly based on likelihood maximization. Only three papers proposed such an algorithm, including the Marked Point Process from Mohler ( 2014 ), the maximum likelihood efficient importance sampling (ML-EIS) from Liesenfeld et al. ( 2017 ), and the Hawkes Process from Mohler et al. ( 2018 ).

In the case of machine learning algorithms, their formulation is often associated with finding a function $f$ that maps feature vectors X to a given output Y. These algorithms are distinguished from each other in the way this function is estimated, some being more accurate and complex than others. We include in this category all algorithms that are explicitly associated with regression or classification. They differ from algorithms of previous categories, because $f$ is constructed only after a training process. This training step aims to find a model that minimizes the estimation error between the predicted output and the original output. The majority of the reported papers (n = 20) was included in this class of algorithms. The most proposed traditional machine learning algorithms were RF and MLP (tied at n = 6), followed by SVM together with Logistic Regression (n = 4), and Negative Binomial Regression used in RTM studies together with (n = 3).

Although deep learning algorithms have the same formulation as traditional machine learning algorithms, they present a much more complex internal structure that affects their use. The deep layer structure makes the computational budget mainly needed during training. Additionally, the need for samples is also greater, than for the other approaches. Among the reported papers, the three that have used this type of algorithm argue that it has the best overall performance. This includes the Deep Neural Networks (DNN) fitted by Lin et al. ( 2018 ), the DeepCrime framework from Huang et al. ( 2018 ), and the Long Short-Term Memory (LSTM) architecture proposed by Zhuang et al. ( 2017 ). The paper by Huang et al. ( 2018 ) even presents a neural architecture dedicated to a feature-independent approach, with a recurrent layer dedicated to encoding the temporal dependencies directly from the criminal occurrences. Still, none of these papers has discussed computational time performance against other algorithms, nor sample sizes sufficient to obtain accurate models. At the time of this writing, we argue that there is no clear guidance on when one should conduct a deep neural networks approach, although in recent years evidence of its effectiveness has begun to emerge.

Proposed method input

Another split factor is the inputs of the forecasting methods, i.e. the independent variables. There are some forecasting methods that accept as input the latitude, longitude, and timestamp of criminal events (raw data), while others need to apply explicit aggregations or transformations before feeding their models. In this paper, we refer as feature engineering the process of crafting, scaling and selecting covariates or features to better explain a prediction variable which often requires considerable domain-specific knowledge. An example is the aggregation of criminal events into spatiotemporal series, which can be decomposed into autoregressive lags and used as features. This feature engineering can also be applied to ancillary data methodologies not directly related to crime. For instance, Lin et al. ( 2018 ) count landmarks on the grid by counting the number of items in each cell (spatial aggregation) and craft a new feature for each landmark type, while Huang et al. ( 2018 ) define a part of their algorithm being a region embedding layer for only processing the raw location of the city’s landmarks. We believe that the split factor by method inputs may be useful information for a potential researcher who wishes to perform spatial forecasting and consults this section of our paper. Data processing requires domain knowledge, and it is an expensive (timewise) task, especially when dealing with large spatiotemporal datasets. Thus, avoiding the feature-engineering process may be preferable by some researchers. On the other hand, one may prefer to use data to derive their variables with particular patterns.

We call methods that have an internal approach to aggregating crime events into spatiotemporal variables “feature-engineering independent” and “feature-engineering dependent”. In other words, these methods explicitly need aggregations to derive spatiotemporal variables from the raw data independently of the forecasting algorithm. The majority (n = 24) of reported papers have an explicit way to transform their crime events, as well as ancillary data, into features to feed their algorithms (i.e., feature-engineering dependent). Although we have found many different forms of data aggregation into features, both spatially and temporally, the procedure of assigning features is often insufficiently reported, making it difficult to reproduce the proposed methodology. Still, well-defined workflows or frameworks followed by feature-engineering dependent methods were detailed in Malik et al. ( 2014 ) and Araújo et al. ( 2018 ). They synthesized their forecasting methods in (1) aggregate raw data spatially, following a crime mapping methodology (e.g., counting events inside grid cells), (2) generate time series and their features, (3) fit a forecasting model using an algorithm, and (4) visualize the results. In feature-engineering dependent methods the aggregation and time series generation is done separately as processing steps before fitting a model, whereas this is not needed for the feature-engineering independent methods.

Considerations when analysing forecasting performance

In this section, we look at measures of forecasting performance (“ Overview of evaluation metrics ” section) and discuss which are used for each forecasting task, including for classification and regression (“ Metrics by forecasting task ” section). Then, we explore validation strategies by types of algorithms (“ Algorithms and validation strategies ” section). Finally, we summarize and discuss the main dependencies and limitations of the above subsections (“ Dependencies and limitations ” section).

Overview of evaluation metrics

As mentioned in “ Spatial crime forecasting methods ” section, selected papers include forecasting baseline models, novel models, or ensemble models proposed by respective authors. Evaluation metrics of such models are in general, well-known in criminology, GIScience, mathematics, or statistics. However, it is important to mention that few authors highlight the necessity of combining or using diverse evaluation metrics.

We cannot make a comparison of all evaluation results across the 32 papers due to various reasons, such as different spatial and temporal units, study areas, or forecasting methods applied. Yet, we can discuss certain similarities between them. Choosing an evaluation metric is highly dependent on the main prediction outcome, such as counts (e.g., for a Poisson distribution), normalized values or rates (e.g., for a continuous Gaussian distribution), or binary classification (crime is absent or present). The most frequent evaluation metrics used in the selected papers are the Prediction Accuracy (PA, n = 9), followed by the Prediction Accuracy Index (PAI, n = 7), the F1-Score (n = 7), Precision and Recall (n = 5), the Mean Squared Error (MSE, n = 4), the Root Mean Squared Error (RMSE, n = 3), R-squared (n = 3), the Recapture Rate Index (RRI, n = 3), the Hit Rate (n = 2), the Area Under the Curve (AUC, n = 2), and the Mean Absolute Forecast Error (MAFE, n = 2). Some additional metrics are used only once, namely the Spatio-Temporal Mean Root Square Estimate (STMRSE), the average RMSE (RMSE), the Regression Accuracy Score (RAS), the Regression Precision Score (RPS), the Ljung-Box test, the Mean Absolute Error (MAE), the Mean Absolute Percentage Error (MAPE), macro-F1, micro-F1, the Mean (Squared) Forecast Error (M(S)FE), the Pearson Correlation Coefficient, and the Nash coefficient. Generally, metrics derived from the confusion matrix, namely accuracy, precision, recall, and F1-Score, are used together to evaluate binary classifications.

We analysed the top three evaluation metrics (PA, PAI, F1-Score) in relation to their distribution among the data items of discipline, proposed forecasting algorithm type, forecasting inference, forecasting task, spatial unit, and temporal unit. Interestingly, we found that computer scientists exclusively use the PA, while criminologists prefer to apply the PAI. In addition, while the PA and the F1-Score have been preferably tested for short-term predictions (i.e., less or equal to 3 months), the PAI has been used for both short and long-term predictions. No other obvious pattern was detected among the other information elements regarding the usage and preference of these evaluation metrics.

Metrics by forecasting task

The most common forecasting task is binary classification (n = 21) for crime hotspots (n = 20) and the category of crime (n = 1). While the classification task is frequently discussed at the beginning of experiments, some articles consider in the performance evaluation a different item than in the output of the algorithm, thus transforming regression products into binary values. The most prominent examples include RTM models (Drawve et al. 2016 ; Dugato et al. 2018 ; Gimenez-Santana et al. 2018 ), where the output of the algorithm is a risk score. This score is later reclassified into a binary outcome (a positive or negative risk score) for the purpose of the evaluation. In addition, Rummens et al. ( 2017 ) propose a combined ensemble model consisting of LR and MLP that infers risk values, similar to RTM, where authors consider as crime hotspot, values with a risk higher than 20%.

The regression task (n = 11) is largely used for predicting the number of crimes (n = 8) and the performance is measured by various error measurements, such as MSE (n = 4) or RMSE (n = 3). Araujo et al. ( 2017 ) propose two new evaluation metrics, namely the Regression Accuracy Score (RAS), representing the percentage of success in predicting a sample, and the Regression Precision Score (RPS), which defines the RAS’s precision. The RPS measures the MSE of success samples normalized by the variance of the training sample (Araujo et al. ( 2017 )). Rodríguez et al. ( 2017 ) introduce the Nash–Sutcliffe Efficiency (NSE), which they derive from hydrological models forecasting, as a normalized statistic determining the relative magnitude of the residual variance compared to the measured data variance.

However, the number of crimes is not the only inference considered in regression models. For example, Ivaha et al. ( 2007 ) predict the percentage of crime in clusters, using spatial ellipses as spatial units, Rodríguez et al. ( 2017 ) investigate properties of clusters, while Shoesmith ( 2013 ) infers crime rates from historical crime data.

In addition to the above-mentioned evaluation metrics, three articles discuss surveillance plots for prediction evaluation. Mohler ( 2014 ) uses a surveillance plot metric showing the fraction of crime predicted over a time period versus the number of grid cells with real crimes for each day (Fig. 4 a). The same author mentions that this metric is similar to the receiver operating characteristic curve, or ROC curve, applied by Gorr ( 2009 ), but differs because it is not using an associated false positive rate on the x-axis. Similarly, Hu et al. ( 2018 ) apply the PAI curve, also a surveillance plot showing the area percentage on the x-axis, and the PAI or the hit rate value on the y-axis (Fig. 4 b, c). Similarly, Rosser et al. ( 2017 ) use hit rate surveillance plots, representing the mean hit rate against the coverage for the network and grid-based prediction approaches (Fig. 4 c). These plots are highly useful to visualize metrics’ values over the surveyed territory.

Comparable surveillance plots for evaluation metrics visualization in space (using dummy data). a ROC-like accuracy curve, b PAI curve, and c Hit rate curve

Algorithms and validation strategies

As mentioned in “ Spatial crime forecasting methods ” section, in many of the papers, the proposed forecasting method does not include a novel algorithm, but mostly applies new variables that have not previously been used before. When reminding us of the four types of algorithms, namely (i) kernel-based, (ii) point process, (iii) traditional machine learning, and (iv) deep learning, we note a diversity between the proposed forecasting and the baseline methods. In addition, validation strategies are diverse, as well. Half of the studies (n = 16) consider splitting the data into training and testing subsets. Most of these studies include 70% training (current) for 30% testing (future) sets. Johansson et al. ( 2015 ) use a combined approach, including rolling horizon, which is producing ten times the size of a sample for the KDE algorithm, containing 70% of the original crime dataset (keeping the 70/30 ratio). The final result is calculated as the mean of the ten measurements. Figure 5 gives a good overview of all algorithms and their validation strategies. This decision tree visualization includes five central data items, namely prediction task, proposed input forecasting method, proposed forecasting algorithm type, validation strategy, and evaluation metrics. Classification m refers to those evaluation metrics that are particularly used for classification tasks (e.g., PA, F1-score). Regression m is a composition of error metrics for regression analysis (e.g., MSE, RMSE, MAE), while Criminology m includes crime analysis metrics (e.g., PAI, RRI).

Overview of forecasting methods (see “ Spatial crime forecasting methods ” section) and their performance evaluation (see “ Considerations when analysing forecasting performance ” section) linked to the 32 selected papers. The papers’ references linked to their number are shown in Table 3 . The letter m denotes an evaluation metric. The letter “U” denotes an undefined item

Kernel-based algorithms are preferably used to predicting hotspots (n = 5) and the number of crimes (n = 1). Interestingly, Malik et al. ( 2014 ) bring into discussion the fact that regions with similar demographics tend to illustrate similar trends for certain crime types. This observation is included in their prediction model “Dynamic Covariance Kernel Density Estimation method (DSKDE)” and compared with the “Seasonal Trend decomposition based on Loess (STL)” baseline model. Hart and Zandbergen ( 2014 ) and Johansson et al. ( 2015 ) use a kernel-based KDE approach without comparing it with a baseline method, both considering the PAI as one of the evaluation metrics. Only two of the kernel-based studies consider ancillary data (Gorr et al. 2003 ; Rosser et al. 2017 ), yet both use different validation strategies (rolling-horizon and train-test split, respectively) and evaluation metrics (MAE, MSE, MAPE in the first publication and Hit Rate in the second publication). Thus, it is worth noting that, while using the same base algorithm, such as KDE, other components of the prediction process may be different.

Two out of three point process algorithms do not explain the validation strategy followed in the studies (Liesenfeld et al. 2017 ; Mohler 2014 ). Mohler ( 2014 ) shows an interesting point process approach using only historical crime data, capturing both short-term and long-term patterns of crime risk. This article includes the surveillance plot evaluation (see “ Metrics by forecasting task ” subsection), comparing chronic and dynamic hotspot components for homicides and all crime types.

The third category of forecasting algorithms, the traditional ML, is split up almost equally between classification and regression tasks. Only three articles discussing traditional ML algorithms do not mention information about the baseline comparison (Araújo et al. 2018 ; Rodríguez et al. 2017 ; Rummens et al. 2017 ). The majority of ML algorithms (n = 11) use the training–testing split validation strategy applied to the classification task. Interestingly, one of the articles (Yu et al. 2011 ) discusses a different validation approach, the “Leave-One-Month-Out” (LOMO), where instead of running the classification only once on the training and testing data sets, it is run on S − 1 sets (S = number of sets/months).

An increasing body of forecasting techniques are based on DL, however, for this review, we include only three articles, with all of them for short-term prediction and coming from the computer science discipline (Huang et al. 2018 ; Lin, Yen, and Yu 2018 ; Zhuang et al. 2017 ). Two of the three articles consider geographic ancillary variables and apply the rolling-horizon validation strategy, while the third article deals only with crime lags following a 10-fold cross-validation approach. All three articles consider a binary classification evaluated by metrics such as the PA and the F1-score. Zhuang et al. ( 2017 ) propose a spatio-temporal neural network (STNN) for forecasting crime occurrences, while embedding spatial information. They then compare the STNN with three state-of-the-art methods, including the Recurrent Neural Network (RNN), the Long Short-Term Memory (LSTM), and the Gated Recurrent Unit (GRU). Since the model is designed for all types of crime data, each crime type can lead to different performances of the STNN due to their variability in time and space. Presumably, challenges will appear for crime types with low data volumes, because neural networks require a sufficient amount of data for training.

Dependencies and limitations

Although most papers use standard evaluation metrics, such as PA for a binary outcome, they usually do not include complementary metrics, in order to ensure that every aspect of the prediction performance is covered. Often, the PA is used by itself to measure model performance (Araújo et al. 2018 ; Malik et al. 2014 ; Mu et al. 2011 ). Complementary metrics are needed, because whilst one method may have a higher evaluation score than others, they may provide additional information. For example, while showing a high PAI, the Prediction Efficiency Index (PEI) value (Hunt 2016 ) may be reduced. PEI is another evaluation metric that is calculated by the ratio of the PAI to the maximum possible PAI a model can achieve. The difference between the PAI and the PEI can be explained by both metrics having different dependencies on the cell size.

Complementary metrics also overcome limitations of some evaluation measures. For example, the PA is the sum of true positives and true negatives divided by the total number of instances, which represents the percentage that is classified correctly. However, this information may not be enough to judge the performance of a model, because it omits information about precision. The Hit rate and the PAI are obtained through a division, thus, when the denominator is small, both metrics are high. Consequently, when crime occurrences are low, results are heavily affected.

Furthermore, traditional metrics are global in nature, but in spatial prediction or forecasting, we are also interested in the spatial distribution of the prediction. There may be local areas of good and bad prediction performance, resulting in an average global value. A complementary metric for a regression outcome could be to calculate the Moran’s I of the prediction error and explore the variation of the predictive performance throughout the study area. Ideally, the prediction error should follow a random spatial distribution. Overall, we find a low to no interest in developing (local) spatial, temporal, or spatiotemporal evaluation metrics.

The relevance of evaluation metrics may be biased for various reasons. One example can be the class imbalance. A model can have high accuracy while predicting the locations without crime very well. In contrast, locations with crimes are not well forecasted. Some authors try to ameliorate the negative–positive ratio between crime and no crime cells, by adjusting the weight of hotspots and cold spots (Yu et al. 2011 ), or change the training set, while the test set keeps its original, real data (Rumi et al. 2018 ). Another dependency is the different kinds of aggregation that take place during modelling by time, space, or crime types attributes. For instance, while the majority of papers report to work with disaggregated crime types, some of them consider to aggregate crime types to, e.g., “violent crimes”, without specifying which types are included. In addition, the effects spatiotemporal aggregations have on the forecasting performance are typically not analysed, but could easily be conducted with a sensitivity analysis.

In this section, we perform a SWOT analysis of the most significant findings.

One of the strongest elements of current research efforts is the incorporation of spatial or spatiotemporal information into traditional prediction algorithms. The examples of this approach is STAR and STKDE (Shoesmith 2013 ; Rosser et al. 2017 ). Also, KDE, a traditional method in the field, has been adapted to consider sampling problems, such as sparse data (DCKDE) and grid format (NTKDE) (Malik et al. 2014 ; Rosser et al. 2017 ). Besides, the interest of the scientific community in the incorporation and effect of big data into prediction is evident from the related work section. This interest is also supported by the trend of introducing dynamic variables into the modelling process, such as calculating visitor entropy from Foursquare or ambient population from social networks and transportation. Regarding the performance evaluation, surveillance plots (Fig. 4 ) provide a more detailed picture of the accuracy of the forecasted information. Since they include the area coverage on the x-axis, they can be used by the police as a decision tool to identify the threshold that balances prediction accuracy with the size of patrolling areas.

Overall, significant details of study experiments are not always reported and commonly undefined items are the spatial unit of analysis and the sample size. Similarly, for feature-engineering dependent methods the crafting procedures are not sufficiently described. The above elements make a study difficult to reproduce or to compare its results with a possible future study. Furthermore, we did not find any open source tools that implement spatial crime forecasting using the best-proposed methods reported. Such a tool could enhance the possibility of reproducing results from an existing forecasting study. We suggest that all data items analysed in “ Overview of selected publications on spatial crime forecasting ” section (for an overview have a look at Table 3 ) should always be reported. However, a detailed “spatial forecasting protocol” could be developed similarly to protocols for other modelling approaches such as the ODD protocol (Grimm et al. 2010 ). Furthermore, the most common spatial unit is the grid cell, which may not necessarily align with places that policing resources are typically deployed to. So far, we did not encounter a study that sufficiently addresses this issue. Regarding the performance evaluation, most authors use standard metrics. A “global” standard metric, such as MAE, cannot describe the distribution of the prediction error across space, which can vary a lot. We thus propose to develop novel local spatial or spatiotemporal evaluation metrics. Finally, other modelling issues are hardly discussed, if at all, such as overfitting, multi-collinearity, sampling bias, and data sparsity.

Opportunities

There is a tremendous increase in spatial crime forecasting studies. From the pool of the 32 selected papers, 7 and 11 papers were published in 2017 and 2018, respectively, compared to about one paper per year between 2000 and 2016 (Fig. 2 ). This shows the growing interest of scholars from varying disciplines (compare Table 2 ) into this kind of research. The crime type that has been studied the most is residential burglary. It is unclear why this particular crime type and property crimes, in general, are more likely to be studied. A future opportunity could be to systematically test whether there is a pattern of property crimes to consistently outperforming other crime types and why. Furthermore, except for RTM and KDE, other spatial methods mentioned in the related work section (“ Related work ” section) were not used by the selected papers. The reason may be that authors have varying backgrounds, such as computer science and criminology, and may not be familiar with such methods. This opens a research opportunity to explore and compare less used spatial methods with traditional approaches, such as RTM or KDE. Another opportunity would be to compare the forecasting performance of the methods among each other. In this review, we presented methodological trends, but a fair comparison among spatial methods was not possible. First, some methods were not compared to a baseline method. Other authors compared the same method with a different set of features. Even if there were papers with a similar set of features a comparison among them would be biased due to variations of sample data, study areas, sampling periods, etc. Future empirical studies should focus on the comparison of algorithms, of which the number is constantly increasing. We merged the selected papers into four categories of forecasting algorithms, including the kernel-based, point processes, traditional machine learning, and deep learning. Traditional machine learning algorithms were present in most proposed methods, with MLP and RF being the most common ones, while AR models were the most used baselines methods. A suggestion is to compare new or recently developed algorithms to the most frequently proposed ones, instead of continuing to conduct further comparisons with traditional or simpler methods that have repeatedly shown to underperform.

We outlined that spatial crime forecasting studies lack coherent terminology, especially for terms such as “prediction”, “forecasting”, and “hotspots”. The predominant predictive task is the binary classification (n = 21) and the predominant forecasting inference is hotspots (n = 20). It is important to understand the rationale behind this trend. Is regression analysis less useful or more difficult to predict? Although we notice a constant increase in developing classification algorithms or features to be infused in the classification task, we acknowledge the importance of both prediction tasks. Also, for the display of an area’s crime picture, it is important to examine both hotspots and coldspots or a multiclass classification towards the hottest crime spots. However, none of these was the focus of the examined papers. We acknowledge that forecasting hotspots is important for police to allocate resources. Nevertheless, what about the information that can be derived by other types of spatial groupings such as coldspots, coldspot outliers, or hotspot outliers, commonly referred to as LL, LH, HL (low–low, low–high, high-low, respectively) and calculated by the local Moran statistic (Anselin 2005 )? Science needs to progress knowledge, which requires understanding and examining all aspects of a phenomenon. Finally, only a third of all papers performed long-term predictions. Although this trend is positive because law enforcement has an interest in almost real-time prediction, the long-term prediction should not be overlooked as playing an important role in the understanding of the crime risk and providing a broad picture for strategic planning.

In this paper, we focus on “Spatial Crime Forecasting”, which is an inference approach about crime both in time and in space. We conducted a systematic literature review that follows the reporting guidance “PRISMA” (Liberati et al. 2009 ) to understand and evaluate the state of the art concerning concepts and methods in empirical studies on crime with many applications and special attention to crime. We addressed several research questions that deal with the role of space in the forecasting procedure, the methods used, the predictive performance, and finally model validation strategies.

We identified five types of inference, namely (1) hotspots (the majority of the papers), (2) number of crime, (3) crime rate, (4) category of crime, (5) percent of crime in clusters, and (6), properties of clusters. With regards to forecasting methods, the authors proposed mostly traditional machine learning methods, but also kernel density estimation based approaches, and less frequently point process and deep learning approaches. When it comes to measuring performance, a plethora of metrics were used with the top three ones being the Prediction Accuracy, followed by the Prediction Accuracy Index, and the F1-Score. Finally, the most common validation approach was the train-test split while other approaches include the cross-validation, the leave one out, and the rolling horizon.

This study was driven by the increasing publication of spatial crime forecasting studies and (crime predictive analytics in general). More than half of the selected papers (n = 32) were published in the last 2 years. In specific, about one paper per year was published between 2000 and 2016, while 7 and 11 papers were published in 2017 and 2018, respectively. At the same time, there is a global growth of scientific publication outputs. Bornmann and Mutz ( 2015 ), fitted an exponential model to this growth and calculated an increasing rate of outputs of about 3% annually, while the volume is estimated to double in approximately 24 years. Yet the yearly patterns of the selected papers show a much greater increase that indicates the importance and future potential of studies related to spatial crime forecasting.

Furthermore, we would like to outline the main limitations that may prohibit reproducibility, and hence the advancement of this topic in the long term. First, the terminology being used is not consistent possibly due to the fact that scientists working on this topic have various backgrounds (e.g. criminology, computer science, geosciences, public policy, etc.). Second, significant details of study experiments are vaguely or not at all reported. With respect to the last point, we suggested reporting the following data items: study area, scale, sampling period, months, type, sample, inference, task, spatial unit, and temporal unit (in total 10 items). Additional items to be reported are proposed method, best - proposed method, baseline method, evaluation metrics, and validation strategy (in total 5 items).

Availability of data and materials

The list of manuscripts used for this research is mentioned in Table 3 . If needed, the authors can provide the list of 193 manuscripts that went through the eligibility phase.

JCR: https://clarivate.com/webofsciencegroup/solutions/journal-citation-reports/ .

Al Boni, M., & Gerber, M. S. (2016). Predicting crime with routine activity patterns inferred from social media. In IEEE International Conference on Systems, Man and Cybernetics (SMC) , (pp. 1233–1238). https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7844410 .

Anselin, L. (2005). Exploring spatial data with GeoDaTM: A workbook . Santa Barbara: Center for Spatially Integrated Social Science.

Google Scholar

Araújo, A., Cacho, N., Bezerra, L., Vieira, C., & Borges, J. (2018). Towards a crime hotspot detection framework for patrol planning. In 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS) , (pp. 1256–1263). https://doi.org/10.1109/HPCC/SmartCity/DSS.2018.00211 .

Araujo, A. J., Cacho, N., Thome, A. C., Medeiros, A., & Borges, J. (2017). A predictive policing application to support patrol planning in smart cities. In International Smart Cities Conference (ISC2) . https://www.researchgate.net/profile/Adelson_Araujo2/publication/321236214_A_predictive_policing_application_to_support_patrol_planning_in_smart_cities/links/5c068339299bf169ae316a6f/A-predictive-policing-application-to-support-patrol-planning-in-smart-ci .

Bernasco, W., & Elffers, H. (2010). Statistical analysis of spatial crime data. In A. R. Piquero & D. Weisburd (Eds.), Handbook of quantitative criminology (pp. 699–724). New York: Springer. https://doi.org/10.1007/978-0-387-77650-7_33 .

Chapter Google Scholar

Bornmann, L., & Mutz, R. (2015). Growth Rates of Modern Science: A Bibliometric Analysis Based on the Number of Publications and Cited References. Journal of the Association for Information Science and Technology, 66 (11), 2215–2222.

Article Google Scholar

Bowen, D. A., Mercer Kollar, L. M., Wu, D. T., Fraser, D. A., Flood, C. E., Moore, J. C., Mays E. W. & Sumner, S. A. (2018). Ability of crime, demographic and business data to forecast areas of increased violence. International journal of injury control and safety promotion , 25 (4), 443–448. https://doi.org/10.1080/17457300.2018.1467461 .

Bramer, W. M., Rethlefsen, M. L., Kleijnen, J., & Franco, O. H. (2017). Optimal database combinations for literature searches in systematic reviews: A prospective exploratory study. Systematic Reviews, 6 (1), 245.

Brantingham, P. J., & Brantingham, P. L. (1984). Patterns in crime . New York: Macmillan.

Brayne, S. (2017). Big data surveillance: The case of policing. American Sociological Review, 82 (5), 977–1008.

Brown, D. E., & Oxford, R. B. (2001). Data mining time series with applications to crime analysis. In 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat. No. 01CH37236), Vol. 3 (pp. 1453–1458). IEEE. https://doi.org/10.1109/ICSMC.2001.973487 .

Bruinsma, G. J. N., & Johnson, S. D. (2018). The oxford handbook of environmental criminology . Oxford: Oxford University Press.

Book Google Scholar

Caplan, J. M., Kennedy, L. W., & Miller, J. (2011). Risk terrain modeling: brokering criminological theory and gis methods for crime forecasting. Justice Quarterly, 28 (2), 360–381. https://doi.org/10.1080/07418825.2010.486037 .

Chainey, S., Tompson, L., & Uhlig, S. (2008). The utility of hotspot mapping for predicting spatial patterns of crime. Security Journal., 21, 4–28.

Chauhan, C., & Sehgal, S. 2017. A review: crime analysis using data mining techniques and algorithms. In P. N. Astya, A. Swaroop, V. Sharma, M. Singh, & K Gupta, (Ed.), 2017 IEEE International Conference on Computing, Communication and Automation (ICCCA) , edited by , (pp. 21–25).

Chen, H. C., Chung, W. Y., Xu, J. J., Wang, G., Qin, Y., & Chau, M. (2004). Crime data mining: A general framework and some examples. Computer, 37 (4), 50–56. https://doi.org/10.1109/MC.2004.1297301 .

Cohen, J., Gorr, W. L., & Olligschlaeger, A. M. (2007). Leading indicators and spatial interactions: A crime‐forecasting model for proactive police deployment. Geographical Analysis , 39 (1), 105–127. https://doi.org/10.1111/j.1538-4632.2006.00697.x

Cressie, N. A. C. (1993). Statistics for spatial data ., New York: Wiley. https://doi.org/10.2307/2533238 .

Dash, S. K., Safro, I., & Srinivasamurthy, R. S. (2018). Spatio-temporal prediction of crimes using network analytic approach. In 2018 IEEE International Conference on Big Data (Big Data) (pp. 1912-1917). IEEE. https://doi.org/10.1109/BigData.2018.8622041 .

Drawve, G., Moak, S. C., & Berthelot, E. R. (2016). Predictability of gun crimes: A comparison of hot spot and risk terrain modelling techniques. Policing & Society, 26 (3), 312–331. https://doi.org/10.1080/10439463.2014.942851 .

Dugato, M., Favarin, S., & Bosisio, A. (2018). Isolating target and neighbourhood vulnerabilities in crime forecasting. European of Criminal Policy and Reserach., 24 (4 SI), 393–415. https://doi.org/10.1007/s10610-018-9385-2 .

Gerber, M. S. (2014). Predicting crime using twitter and kernel density estimation. Decision Support Systems., 61, 115–125. https://doi.org/10.1016/j.dss.2014.02.003 .

Gimenez-Santana, A., Caplan, J. M., & Drawve, G. (2018). Risk terrain modeling and socio-economic stratification: Identifying risky places for violent crime victimization in Bogota, Colombia. European of Criminal Policy and Reserach, 24 (4 SI), 417–431. https://doi.org/10.1007/s10610-018-9374-5 .

Gorr, W. L. (2009). Forecast accuracy measures for exception reporting using receiver operating characteristic curves. International Journal of Forecasting, 25 (1), 48–61.

Gorr, W., & Harries, R. (2003). Introduction to crime forecasting. International Journal of Forcasting 19. https://www.sciencedirect.com/science/article/pii/S016920700300089X .

Gorr, W., Olligschlaeger, A., & Thompson, Y. International Journal Of, and Undefined 2003. (2003). “Short-Term Forecasting of Crime.” International Journal of Forecasting . https://www.sciencedirect.com/science/article/pii/S016920700300092X .

Grimm, V., Berger, U., DeAngelis, D. L., Polhill, J. G., Giske, J., & Railsback, S. F. (2010). The ODD protocol: A review and first update. Ecological Modelling, 221 (23), 2760–2768.

Haddaway, N. R., Collins, A. M., Coughlin, D., & Kirk, S. (2015). The role of google scholar in evidence reviews and its applicability to grey literature searching. PloS ONE 10(9).

Hardyns, W., & Rummens, A. (2018). predictive policing as a new tool for law enforcement? Recent developments and challenges. European Journal on Criminal Policy and Research, 24 (3), 201–218. https://doi.org/10.1007/s10610-017-9361-2 .

Hart, T., & Zandbergen, P. (2014). Kernel density estimation and hotspot mapping examining the influence of interpolation method, grid cell size, and bandwidth on crime forecasting. Policing—An International Journal o FPolice Strategies & Management, 37 (2), 305–323. https://doi.org/10.1108/PIJPSM-04-2013-0039 .

Hassani, H., Huang, X., Silva, E. S., & Ghodsi, M. (2016). A review of data mining applications in crime. Statistical Analysis and Data Mining, 9 (3), 139–154. https://doi.org/10.1002/sam.11312 .

Holone, H. (2016). The filter bubble and its effect on online personal health information. Croatian Medical Journal, 57 (3), 298.

Hu, Y., Wang, F., Guin, C., Zhu, H. (2018). A spatio-temporal Kernel density estimation framework for predictive crime hotspot mapping and evaluation. Applied Geography 99:89–97. https://www.sciencedirect.com/science/article/pii/S0143622818300560 .

Huang, C., Zhang, J., Zheng, Y., & Chawla, N. V. (2018). DeepCrime: Attentive hierarchical recurrent networks for crime prediction. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management , (pp. 1423–1432). CIKM’18. New York, NY, USA: ACM. https://doi.org/10.1145/3269206.3271793 .

Hunt, J. M. (2016). Do crime hot spots move? Exploring the effects of the modifiable areal unit problem and modifiable temporal unit problem on crime hot spot stability. American University.

Ivaha, C., Al-Madfai, H., Higgs, G., & Ware, J. A. (2007). The dynamic spatial disaggregation approach: A spatio-temporal modelling of crime. In World Congress on Engineering , (pp. 961–966). Lecture Notes in Engineering and Computer Science. http://www.iaeng.org/publication/WCE2007/WCE2007_pp961-966.pdf .

Johansson, E., Gåhlin, C., & Borg, A. (2015). Crime hotspots: An evaluation of the KDE spatial mapping technique. In 2015 European Intelligence and Security Informatics Conference , (pp. 69–74). https://doi.org/10.1109/EISIC.2015.22 .

Kadar, C., Brüngger, R. R., & Pletikosa, I. (2017). Measuring ambient population from location-based social networks to describe urban crime. In International Conference on Social Informatics , (pp. 521–35). Springer, New York.

Kadar, C., & Pletikosa, I. (2018). Mining large-scale human mobility data for long-term crime prediction. EPJ Data Science . https://doi.org/10.1140/epjds/s13688-018-0150-z .

Kennedy, L. W., & Caplan, J. M. (2012). A theory of risky places . Newark: Rutgers Center on Public Security.

Kennedy, L. W., & Dugato, M. (2018). “Forecasting crime and understanding its causes. Applying risk terrain modeling worldwide. European Journal on Criminal Policy and Research, 24 (4, SI), 345–350. https://doi.org/10.1007/s10610-018-9404-3 .

Kinney, J. B., Brantingham, P. L., Wuschke, K., Kirk, M. G., & Brantingham, P. J. (2008). Crime attractors, generators and detractors: Land use and urban crime opportunities. Built Environment, 34 (1), 62–74.

Liberati, A., Altman, D., Tetzlaff, J., Mulrow, C., Gøtzsche, P. C., Ioannidis, J. P. A., et al. (2009). The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: Explanation and elaboration. Journal of Clinical Epidemiology . https://doi.org/10.1016/j.jclinepi.2009.06.006 .

Liesenfeld, R., Richard, J. F., & Vogler, J. (2017). Likelihood-based inference and prediction in spatio-temporal panel count models for urban crimes. Journal of Applied Econometrics, 32 (3), 600–620. https://doi.org/10.1002/jae.2534 .

Lin, Y. L., Yen, M. F., & Yu, L. C. (2018). Grid-based crime prediction using geographical features. ISPRS International Journal of Geo-Information, 7 (8), 298.

Malik, A., Maciejewski, R., Towers, S., McCullough, S., & Ebert, D. S. (2014). Proactive spatiotemporal resource allocation and predictive visual analytics for community policing and law enforcement. IEEE Transactions on Visualization and Computer 20(12): 1863–72. https://www.computer.org/csdl/trans/tg/2014/12/06875970-abs.html .

Mohler, G. (2014). Marked point process hotspot maps for homicide and gun crime prediction in Chicago. International Journal of Forecasting, 30 (3), 491–497. https://doi.org/10.1016/j.ijforecast.2014.01.004 .

Mohler, G., & Porter, M. D. (2018). Rotational grid, PAI-maximizing crime forecasts. Statistical Analysis and Data Mining: The ASA Data Science Journal , 11 (5), 227-236. https://doi.org/10.1002/sam.11389 .

Mohler, G., Raje, R., Carter, J., Valasik, M., & Brantingham, J. (2018). A penalized likelihood method for balancing accuracy and fairness in predictive policing. In 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC) . https://ieeexplore.ieee.org/abstract/document/8616417/ .

Mohler, G. O., Short, M. B., Brantingham, P. J., Schoenberg, F. P., & Tita, G. E. (2011). Self-exciting point process modeling of crime. Journal of the American Statistical Association, 106 (493), 100–108.

Mohler, G. O., Short, M. B., Malinowski, S., Johnson, M., Tita, G. E., Bertozzi, A. L., et al. (2015). Randomized controlled field trials of predictive policing. Journal of the American Statistical Association, 110 (512), 1399–1411. https://doi.org/10.1080/01621459.2015.1077710 .

Mu, Y., Ding, W., Morabito, M., & Tao, D. (2011). Empirical discriminative tensor analysis for crime forecasting. In International Conference on Knowledge Science, Engineering and Management (pp. 293-304). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25975-3_26 .

Ohyama, T., & Amemiya, M. (2018). applying crime prediction techniques to Japan: A comparison between risk terrain modeling and other methods. European Journal on Criminal Policy and Research, 24 (4), 469–487.

Ozkan, T. (2018). Criminology in the age of data explosion: new directions. The Social Science Journal . https://doi.org/10.1016/J.SOSCIJ.2018.10.010 .

Papamitsiou, Z., & Economides, A. A. (2014). Learning analytics and educational data mining in practice: A systematic literature review of empirical evidence. Journal of Educational Technology & Society, 17 (4), 49–64.

Perry, W. L. (2013). Predictive policing: The role of crime forecasting in law enforcement operations . Santa Monica: Rand Corporation.

Ratcliffe, J. (2015). What is the future… of predictive policing. Practice, 6 (2), 151–166.

Rodríguez, C. D., Gomez, D. M., & Rey, M. A. (2017). Forecasting time series from clustering by a memetic differential fuzzy approach: An application to crime prediction. In 2017 IEEE Symposium Series on Computational Intelligence (SSCI) , (pp. 3372–3379). https://ieeexplore.ieee.org/abstract/document/8285373 .

Rosser, G., Davies, T., Bowers, K. J., Johnson, D. S., & Cheng, T. (2017). Predictive crime mapping: Arbitrary grids or street networks? Journal of Quantitative Criminology, 33 (3), 569–594. https://doi.org/10.1007/s10940-016-9321-x .

Rumi, S. K., Deng, K., & Salim, F. D. EPJ Data Science, and Undefined 2018. (2018). Crime event prediction with dynamic features. EPJ Data Science . https://doi.org/10.1140/epjds/s13688-018-0171-7 .

Rummens, A., Hardyns, W., & Pauwels, L. (2017). The use of predictive analysis in spatiotemporal crime forecasting: Building and testing a model in an urban context. Applied Geography, 86, 255–261. https://doi.org/10.1016/j.apgeog.2017.06.011 .

Seele, P. (2017). Predictive sustainability control: A review assessing the potential to transfer big data driven ‘predictive policing’ to corporate sustainability management. Journal of Cleaner Production, 153, 73–86. https://doi.org/10.1016/j.jclepro.2016.10.175 .

Shamsuddin, N.H. M., Ali. N. A., & Alwee, R. (2017). An overview on crime prediction methods. In 6th ICT International Student Project Conference (ICT - ISPC), IEEE. https://ieeexplore.ieee.org/abstract/document/8075335/ .

Shoesmith, G. L. (2013). Space–time autoregressive models and forecasting national, regional and state crime rates. International Journal of Forecasting, 29 (1), 191–201. https://doi.org/10.1016/j.ijforecast.2012.08.002 .

Thongsatapornwatana, U. (2016). A survey of data mining techniques for analyzing crime patterns. In 2016 Second Asian Conference on Defence Technology (ACDT) , (pp. 123–28). https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7437655 .

Thongtae, P., & Srisuk, S. (2008). An analysis of data mining applications in crime domain. In X. He, Q. Wu, Q. V. Nguyen, & W. Ja (Ed.), 8th IEEE International Conference on Computer and Information Technology Workshops , (pp. 122–126). https://doi.org/10.1109/CIT.2008.Workshops.80 .

Vlahogianni, E. I., Karlaftis, M. G., & Golias, J. C. (2014). Short-term traffic forecasting: where we are and where we’re going. Transportation Research Part C: Emerging Technologies, 43, 3–19.

Wang, X., & Brown, D. E. (2011). The Spatio-Temporal Generalized Additive Model for Criminal Incidents. In Proceedings of 2011 IEEE International Conference on Intelligence and Security Informatics , (pp. 42–47). IEEE, New York.

Wang, X., Brown, D. E., Gerber, M.S. (2012). Spatio-temporal modeling of criminal incidents using geographic, demographic, and twitter-derived information. In 2012 IEEE International Conference on Intelligence and Security Informatics , (pp. 36–41). IEEE, New York.

Wang, H., Kifer, D., Graif, C., & Li, Z. (2016). Crime rate inference with big data. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , (pp. 635–644). KDD’16. New York, NY, USA: ACM. https://doi.org/10.1145/2939672.2939736 .

Williams, M. L., & Burnap, P. (2015). Cyberhate on social media in the aftermath of woolwich: A case study in computational criminology and big Data. British Journal of Criminology, 56 (2), 211–238.

Yang, D., Heaney, T., Tonon, A., Wang, L., & Cudré-Mauroux, P. (2018). CrimeTelescope: crime hotspot prediction based on urban and social media data fusion. World Wide Web , 21 (5), 1323–1347. https://doi.org/10.1007/s11280-017-0515-4 .

Yu, C. H., Ward, M. W., Morabito, M., & Ding, W. (2011). Crime forecasting using data mining techniques. In IEEE 11th International Conference on Data Mining Workshops , (pp. 779–786). IEEE, New York.

Zhao, X., & Tang, J. (2017). Modeling temporal-spatial correlations for crime prediction. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management , (pp. 497–506). CIKM’17. New York, NY, USA: ACM. https://doi.org/10.1145/3132847.3133024 .

Zhuang, Y., Almeida, M., Morabito, M., & Ding. W. (2017). Crime hot spot forecasting: A recurrent model with spatial and temporal information. In X. D, Wu, T. Ozsu, J. Hendler, R. Lu, (Ed.), IEEE International Conference on Big Knowledge (ICBK) , (pp. 143–150). https://doi.org/10.1109/ICBK.2017.3 .

Download references

This research was funded by the Austrian Science Fund (FWF) through the Doctoral College GIScience at the University of Salzburg (DK W 1237-N23).

Author information

Authors and affiliations.

Department of Geoinformation Processing, University of Twente, Enschede, The Netherlands

Ourania Kounadi

Doctoral College GIScience, Department of Geoinformatics-Z_GIS, University of Salzburg, Salzburg, Austria

Alina Ristea & Michael Leitner

Boston Area Research Initiative, School of Public Policy and Urban Affairs, Northeastern University, Boston, MA, USA

Alina Ristea

Department of Informatics and Applied Mathematics, Federal University of Rio Grande do Norte, Natal, RN, Brazil

Adelson Araujo Jr.

Department of Geography and Anthropology, Louisiana State University, Baton Rouge, LA, USA

Michael Leitner

You can also search for this author in PubMed Google Scholar

Contributions

OK designed the initial idea, conceived and designed all experiments, analyzed the data and wrote the paper. AR helped designing the paper and running experiments. She also helped writing parts of the paper, with a focus on forecasting performance. AAJr. gave technical support and helped with running experiments and conceptual framework. He also helped writing the manuscript, focusing on forecasting algorithms. ML supervised the work and edited the final manuscript. All authors discussed the results and implications and commented on the manuscript at all stages. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Alina Ristea .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1..

Online survey on Risk of Bias across Studies.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Kounadi, O., Ristea, A., Araujo, A. et al. A systematic review on spatial crime forecasting. Crime Sci 9 , 7 (2020). https://doi.org/10.1186/s40163-020-00116-7

Download citation

Received : 17 November 2019

Accepted : 11 May 2020

Published : 27 May 2020

DOI : https://doi.org/10.1186/s40163-020-00116-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Forecasting
Predictive policing
Spatiotemporal
Spatial analysis

Crime Science

ISSN: 2193-7680

Submission enquiries: [email protected]
General enquiries: [email protected]

Online Crime Reporting System—A Model

Conference paper
First Online: 20 March 2021
Cite this conference paper

Mriganka Debnath 38 ,
Suvam Chakraborty 38 &
R. Sathya Bama Krishna 38

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 708))

Included in the following conference series:

International Conference on Emerging Trends and Advances in Electrical Engineering and Renewable Energy

365 Accesses

Crime which is an unlawful act is increasing in our society day by day. With the advancement of technology, criminals are also getting new ways of doing crimes. Crime takes place due to a few common reasons that are money, imbalance mentality, and emotions. After the crime takes place victims need to go through a very complicated and lengthy process for reporting the crime at police station. It’s also a very hectic process for the crime branch to do it manually and maintain the records. So the crime reporting system is the solution for all the victims and for the crime department. This will not only make the work easy but also it will help the users to access many features like news feed and all the updates regarding crime taking place in your locality. It will bring the police and the victims closer and hence increasing the security. This makes the FIR registration simple and easy hence making it time-efficient. The system will help the crime department to take action as quickly as possible and maintain the database efficiently. The police can even update an alert to the citizens regarding the most wanted persons, lost belongings, and any kind of emergency through this system. As a result, this system will give a sustainable solution to the users, police, and victims for managing the crime in a better and a structured way.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Available as PDF
Read on any device
Instant download
Own it forever
Available as EPUB and PDF
Compact, lightweight edition
Dispatched in 3 to 5 business days
Free shipping worldwide - see info
Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Lal, Divya & Abidin, Adiba & Garg, Naveen & Deep, Vikas. (2016). Advanced Immediate Crime Reporting to Police in India. Procedia Computer Science. 85. 543-549. 10.1016/j.procs.2016.05.216.

Google Scholar

A. Khan, A. Singh, A. Chauhan, A. Gupta, CRIME MANAGEMENT SYSTEM, International Research Journal of Engineering and Technology (IRJET), Volume: 06 Issue: 04 | Apr 2019.

R. S. B. Krishna, B. Bharathi, M. U. Ahamed. A and B. Ankayarkanni, "Hybrid Method for Moving Object Exploration in Video Surveillance," 2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE), Dubai, United Arab Emirates, 2019, pp. 773-778, doi: 10.1109/ICCIKE47802.2019.9004330.

Kalaiarasi, G. & K K, Thyagharajan. (2019). Clustering of near duplicate images using bundled features. Cluster Computing. 22. 10.1007/s10586-017-1539-3.

N.G.M. Reddy, G.R. Sai, A. Pravin, Web camera primarily based face recognition victimization VIOLA JONE’S rule with arduino uno, in 2019 International Conference on Communication and Signal Processing (ICCSP) (IEEE, 2019), pp. 0667–0671

T. Prem Jacob (2015), Implementation of Randomized Test Pattern Generation Strategy, Journal of Theoretical and Applied Information Technology, Vol.73 No.1, pp.no.59-64.

G.V.K. Sai, P.S. Kumar, A.V.A. Mary, Incremental frequent mining human activity patterns for health care applications, in IOP Conference Series: Materials Science and Engineering , vol. 590, no. 1 (IOP Publishing, UK, 2019), p. 012050

R. Sethuraman, G. Sneha, D.S. Bhargavi, A semantic web services for medical analysis in health care domain, in 2017 International Conference on Information Communication and Embedded Systems (ICICES) (IEEE, 2017), pp. 1–5

A. Jesudoss, M.J. Daniel, J.J. Richard, Intelligent medicine management system and surveillance in IoT environment, in IOP Conference Series: Materials Science and Engineering , vol. 590, no. 1 (IOP Publishing, UK, 2019), p. 012005

A. Sivasangari, S. Poonguzhali, M. Immanuel Rajkumar, Face photo recognition using sketch image for security system. Int. J. Innovative Technol. Exploring Eng. 8 (9S2)

P. Yugandhar, B. Muni Archana, Crime reporting system. Int. J. Innovative Res. Technol. 4 (11), 1745–1748 (2020)

S. Dhamodaran, J. Albert Mayan, N. Saibharath, N. Nagendra, M. Sundarrajan, Spatial interpolation of meteorological data and forecasting rainfall using ensemble techniques, AIP Conference Proceedings,volume.2207 (2020), Issue 1, 10.1063/5.0000059,https://doi.org/10.1063/5.0000059.

S.M. Raza, A proposed solution for crime reporting and crime updates on maps in android mobile application. Int. J. Comput. Appl. 975 , 8887 (2015)

J. Refonaa, G.G. Sebastian, D. Ramanan, M. Lakshmi, Effective identification of black money and fake currency using NFC, IoT and android, in 2018 International Conference on Communication, Computing and Internet of Things (IC3IoT) (IEEE, 2018), pp. 275–278

K.K. Thyagharajan, G. Kalaiarasi, Pulse coupled neural network based near-duplicate detection of images (PCNN–NDD). Adv. Electr. Comput. Eng. 18 (3), 87–97 (2018)

Article Google Scholar

Download references

Author information

Authors and affiliations.

Department of Computer Science and Engineering, Sathyabama Institute of Science and Technology, Chennai, India

Mriganka Debnath, Suvam Chakraborty & R. Sathya Bama Krishna

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. Sathya Bama Krishna .

Editor information

Editors and affiliations.

Department of Electrical and Electronics Engneering, Sikkim Manipal Institute of Technology, Rangpo, Sikkim, India

Akash Kumar Bhoi

School of Computer Engineering, Kalinga Institute of Industrial Technology (KIIT Deemed to be University), Bhubaneswar, Odisha, India

Pradeep Kumar Mallick

Department of Automation and Applied Informatics, “Aurel Vlaicu” University of Arad, Arad, Romania

Valentina Emilia Balas

Bhabani Shankar Prasad Mishra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper.

Debnath, M., Chakraborty, S., Sathya Bama Krishna, R. (2021). Online Crime Reporting System—A Model. In: Bhoi, A.K., Mallick, P.K., Balas, V.E., Mishra, B.S.P. (eds) Advances in Systems, Control and Automations . ETAEERE 2020. Lecture Notes in Electrical Engineering, vol 708. Springer, Singapore. https://doi.org/10.1007/978-981-15-8685-9_34

Download citation

DOI : https://doi.org/10.1007/978-981-15-8685-9_34

Published : 20 March 2021

Publisher Name : Springer, Singapore

Print ISBN : 978-981-15-8684-2

Online ISBN : 978-981-15-8685-9

eBook Packages : Intelligent Technologies and Robotics Intelligent Technologies and Robotics (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Publish with us

Policies and ethics

Find a journal
Track your research

Skip to main content
Skip to primary sidebar

Criminal Justice

IResearchNet

Academic Writing Services

Crime reports and statistics.

Crime reports and statistics convey an extensive assortment of information about crime to the reader and include topics such as the extent of crime and the nature or characteristics of criminal offenses, as well as how the nature and characteristics of crime change over time. Aside from these big-picture topics related to crime, crime reports and statistics communicate specific information on the characteristics of the criminal incident, the perpetrator(s), and the victim(s). (adsbygoogle = window.adsbygoogle || []).push({});

I. Introduction

Ii. what are crime reports and statistics, and why are they important, iii. who publishes crime reports and statistics, and how do they do it, iv. the federal bureau of investigation and the ucr program, a. what crimes are measured in the ucr, b. the future of the ucr program: the national incident-based reporting system, c. advantages and disadvantages of ucr data, v. the bureau of justice statistics and the ncvs, a. ncvs methodology, b. crimes measured in the ncvs, c. the future of the ncvs, d. advantages of the ncvs, e. disadvantages of the ncvs, vi. ucr data and the ncvs compared, vii. conclusion.

The purpose of this research paper is to provide an overview of crime reports and statistics. Crime reports and statistics convey an extensive assortment of information about crime to the reader and include topics such as the extent of crime and the nature or characteristics of criminal offenses, as well as how the nature and characteristics of crime change over time. Aside from these big-picture topics related to crime, crime reports and statistics communicate specific information on the characteristics of the criminal incident, the perpetrator(s), and the victim(s). For example, crime reports and statistics present information on the incident, such as weapon presence, police involvement, victim injury, and location of the crime. Details such as the age, race, gender, and gang membership of the offender are also available in many of these reports. Also, details gleaned from statistics regarding the victim, such as, but not limited to, income, race, age, relationship with the offender, education, and working status, are made available. Crime reports can convey information that affects the complete population of individuals and/or businesses, or they can convey crime-related information on a subset of victims, such as males, the elderly, businesses, or the poor. Crime reports and statistics can focus on a short period of time, such as a month, or they can cover longer periods, such as 1 year or many years. In addition, these reports can offer change in crime and its elements over time. Statistics offered in crime reports may describe crime as it pertains to a small geographical region, such one city; one region, such as the West or the Northeast; or the entire nation. Finally, on the basis of statistics, these reports can describe crime in a static, point-in-time way, and they can provide a dynamic perspective describing how crime, its characteristics, or risk change over time.

Topics covered in this research paper include a discussion on what crime reports and statistics are as well as why they are important. Information presented includes what agencies publish crime reports and statistics as well as a brief history of these bureaus. Because crime reports and statistics are social products, it is imperative to present information on the data used to generate them. Two major data sources are used to generate crime reports and statistics: (1) the Uniform Crime Reports (UCR) and (2) the National Crime Victimization Survey (NCVS). The data these reports yield, as well as the methodology and measurement they use, are described. Because no data are perfect, a description of their advantages and disadvantages are presented. Because these data are the two primary sources of crime information in the United States, the research paper explores a comparison of these data. Given that entire textbooks can be devoted to the topic of crime reports and statistics, this research paper provides readers with a relatively short overview of the major topics related to these important items. For readers who wish to delve into the topic in greater detail, a list of recommended readings is provided at the close of the research paper.

To fully appreciate the information found in crime reports and the statistics used to summarize them, one must be aware of what is meant by crime reports and statistics, why this topic is important, who is responsible for the creation of reports and statistics, and how the reports and statistics are created. To address these important issues, the research paper is structured around these significant questions. It first addresses the question “What are crime reports and statistics, and why are they important?” Next, it asks, “What agency is responsible for crime reports and statistics?” In answering this question, the research paper presents past and current information about the Federal Bureau of Investigation (FBI) and the Bureau of Justice Statistics (BJS). Next, the research paper moves to addressing the closely related question “How are crime reports and statistics generated?” This portion of the research paper is the lengthiest, because it offers information on the nuts and bolts of the UCR and the NCVS, including a look at the history of the programs as well as future directions. Included also is a discussion of the methodology, advantages, and disadvantages of each program.

Crime reports describe information about crime and cover an almost endless array of crime topics. They can focus on specific crimes, types of victims, types of offenders, and/or characteristics of the offenses. A useful tool in conveying information about crime in crime reports is by using statistics. Statistics are merely numerical measures used to summarize a large amount of information—in this case, information on crime. For example, if one noted that on average in a particular year that 50% of violent crime was reported to the police, that person has simply summarized crime data and presented a simple meaningful number (50%) about that particular phenomenon (crime reporting).

Crime reports and statistics are vital to the study of criminology. Without these tools, our understanding about what kind of crime is occurring, how often crime is being committed, who is committing crime, who is being victimized, and the characteristics of offenses would be little more than guesses. Aside from a pure information utility, crime reports and accompanying statistics serve as an important indicator of the “health” of society. A rising crime rates suggests that society is ailing. Unequal victimization risk among groups of individuals suggests a societal ill in need of attention. Conversely, a reduction in crime conveyed by these reports and statistics is one indicator of an improved quality of life. An equally important function served by crime reports and statistics is to assist researchers in the development of and testing of crime and victimization theories. Another important function of crime reports is providing policymakers valuable, empirically based information so they can design policies to further reduce crime, better assist crime victims, and effectively deal with offenders.Without reliable information on crime, policies designed to reduce all crime and victimization would not only be ineffective but would also represent misappropriated or wasted valuable resources.

In general, the federal government publishes crime reports and statistics.The department within the federal government responsible for these publications is the U.S. Department of Justice. And within the Department of Justice, publications are generated by two bureaus: (1) the FBI and (2) the BJS. Because these documents are generated using taxpayer dollars, more recent crime reports (i.e., since about 1995) are available free to the public online at the respective bureaus (http://www.foi.gov and http://www.ojp.usdoj.gov/bjs).

Most individuals are aware of the crime-fighting responsibilities of the FBI. Fewer know that the responsibilities of the FBI include those of crime data compiler, crime data analysis, and publisher of crime reports for the United States. These responsibilities are accomplished through the UCR program, which compiles crime reports submitted voluntarily either directly by local, state, federal, or tribal law enforcement agencies or through centralized state agencies across the country. Although there are some exceptions, in general, UCR data are submitted to the FBI on a monthly basis. The crime information gathered via the UCR program comprises the nation’s oldest unified national crime data. Although the crime data may be the nation’s oldest, it took approximately 50 years of calls for such data before the UCR program started collecting crime data in 1930, and even then the crime report collection did not occur in the Bureau of Investigation (the precursor to the FBI); instead, a collective of police chiefs are responsible for the commencement of one of the nation’s two major crime information sources.

Calls for unified national statistics on crime were first made first in the 19th century. Although crime data had been collected for a long time, this collection was conducted at the state and local levels by some jurisdictions only. This was problematic, because no two states defined crimes in the same way. Neither did each jurisdiction necessarily collect information on the same crimes. Because of this, there was no way to aggregate this information in any meaningful way to get a unified picture of the national crime situation, and without standard offenses, officials could not make comparisons across jurisdictions. In 1870, the Department of Justice was established. At this time, Congress mandated the reporting of annual crime statistics. A short time later, in 1871, an appeal for unified national crime information was made at the convention of the National Police Association, an organization that later became known as the International Association of Chiefs of Police (IACP). Unfortunately, neither the establishment of the Department of Justice nor the call of police chiefs resulted in the collection of national crime information.

About 50 years later, in the late 1920s, the IACP established a Committee on the Uniform Crime Records to resolve this gap in crime information. The purpose of the committee was to develop a program as well as procedures for collecting information about the extent of crime in the United States. The product of this work was the UCR. Initiated in 1927, this program was designed to provide unified, reliable, and systematic information on a set of serious crimes reported to law enforcement agencies across the country. Using these data, police chiefs could compare crime across jurisdictions and time. The IACP managed the UCR program for several years, until the responsibility moved to FBI in 1935.

The UCR program initially included crime reports from 400 law enforcement agencies from 43 states, describing crime for approximately 20% of the population. Over time, the program has grown, and it now gathers crime reports from approximately 17,000 law enforcement agencies from all states, the District of Columbia, and some U.S. territories. Furthermore, the UCR program now describes crime as it occurs in almost the entire nation. The purpose of the UCR program started as, and continues to be, serving the needs of law enforcement agencies.

The UCR program gathers information on a wide variety of criminal offenses. Since 1985, these crimes have been partitioned into Part I and Part II crimes. Part I offenses include eight crimes that are considered to be serious and occur regularly. The frequency of these offenses means that enough information can be gathered to enable comparisons regarding crime across time and across jurisdiction. The eight Part I offenses include the following: (1) murder and nonnegligent manslaughter, (2) forcible rape, (3) robbery, (4) aggravated assault, (5) burglary, (6) larceny–theft, (7) motor vehicle theft, and (8) arson.

Part II crimes are also considered serious offenses; however, they differ from Part I offenses in that they occur relatively less frequently. Because of the infrequent nature of these events, reliable comparisons between jurisdictions or over time for these offenses are not often possible. The following are Part II criminal offenses:

Other assaults (simple)
Forgery and counterfeiting
Corporate fraud
Embezzlement
Buying, receiving, and possessing stolen property
Possession and carrying of a weapon
Prostitution and commercialized vice
Drug abuse violations
Nonviolent and unlawful offenses against family and children
Driving under the influence
Liquor law violations
Drunkenness
Disorderly conduct
All other violations of state or local laws not specified (except traffic violations)
Suspicion, that is, arrested and released without formal charges
Curfew violations and loitering

The UCR program offers more than simply counts of each crime. Depending on the crime, it also offers details of the criminal incident. The crime for which there is greatest detail in the UCR is murder and nonnegligent manslaughter. Using Supplemental Homicide Reporting forms, the FBI gathers information on the homicide victim’s age, sex, and race; the offender’s age, race, and sex; weapon type (if any); victim and offender relationship; and the circumstances that led to the homicide. For other crimes, some, but not many, details are available. For instance, one can ascertain whether a rape was completed or attempted, whether a burglary involved forcible entry, the type of motor vehicle stolen, and whether a robbery included a weapon.

Since the UCR program was launched, little has changed in terms of the data collected. One exception is the addition of arson as a Part I crime. Over time, it became clear that change was needed in the UCR program. For example, the lack of incident-level detail for offense data gathered was viewed as a significant limitation. In fact, most scholars refer to the UCR program as the UCR summary program, because it collects only aggregate-level information on the eight Part I index crimes over time.Another problem is that some crime definitions had become dated. In response to these and other concerns, evaluations by the FBI, the Bureau of Justice Statistics, the IACP, and the National Security Agency in the late 1970s and early 1980s concluded that the UCR program was in need of modernization and enhancements to better serve its major constituency: law enforcement. The final report of these evaluations and recommendations are available in Blueprint for the Future of the Uniform Crime Reporting Program.

The resulting redesign, introduced in the mid-1980s, is the UCR program’s National Incident-Based Reporting System (NIBRS). As the name indicates, data submitted to the FBI include the nature and types of crimes in each incident, victim(s) and offender(s) characteristics, type and value of stolen and recovered property, and characteristics of arrested individuals. In short, the NIBRS offers much more comprehensive and detailed data than the UCR.

The NIBRS, like the traditional UCR summary program, is voluntary, reflects crimes known to the police and gathers data on the same crimes as the summary program. Although the two systems share some characteristics, major differences exist. A significant difference is that the NIBRS has the capacity to collect incident-level details for all crimes covered. Another difference in the two programs is that the nomenclature of Part I and Part II offenses was discarded in favor of Group A and Group B classes of offenses in NIBRS. Group A crimes are substantially more inclusive than Part I offenses and consist of 22 crimes covering 46 offenses, some of which are listed here:

Homicide (murder and nonnegligent manslaughter, negligent manslaughter, justifiable homicide [which is not a crime])
Sex offenses, forcible (forcible rape, forcible sodomy, sexual assault with an object, forcible fondling)
Assault (aggravated, simple, intimidation)
Burglary/breaking and entering
Larceny–theft
Motor vehicle theft
Sex offense, nonforcible
Counterfeiting/forgery
Destruction/damage/vandalism of property
Drug/narcotic offenses
Pornography/obscene material
Prostitution
Extortion/blackmail
Gambling offenses
Kidnapping/abduction
Stolen property offenses
Weapon law violations

Group B comprises 11 offenses and covers all crime that does not fall into Group A offenses:

Curfew/loitering/vagrancy
Family offense/nonviolent
Peeping tom
Trespass of real property
All other offenses

In the NIBRS, law enforcement agencies are categorized as full-participation agencies or limited-participation agencies. Full-participation agencies are those that can submit data without placing any new burden on the officers preparing the reports and that have sufficient data-processing and other resources to meet FBI reporting requirements. Fullparticipation agencies submit data on all Group A and B offenses. Limited-participation agencies are unable to meet the offense-reporting requirements of full-participation agencies. These agencies submit detailed incident information only on the eight Part I UCR offenses.

Yet another departure from the traditional UCR summary system is that although the NIBRS collects data on many of the same crimes, it uses some revised and new offense definitions. For example, in the traditional UCR summary program, only a female can be a victim of a forcible rape. The NIBRS redefines forcible rape as “the carnal knowledge of a person,” allowing males to be victims of these offenses. A new offense category of crime included in the NIBRS is called crimes against society; these include drug/narcotic offenses, pornography/obscene material, prostitution, and gambling offenses. An important difference between the UCR and the NIBRS is that the NIBRS enables one to distinguish between an attempted versus a complete crime. Previously, no distinction was available. A significant improvement of NIBRS data is the ability to link attributes of a crime. For instance, in the traditional system, with the exception of homicide, one could not link offender information, victim information, and incident victim information.With the NIBRS, one can link data on victims to offenders to offenses to arrestees.

In the NIBRS, the hierarchy rule was changed dramatically. In the traditional system, the hierarchy rule prevented one from counting an incident multiple times due to multiple offenses within the same incident. Using the hierarchy rule, law enforcement agencies determined the most serious offense in an incident and reported only that offense to the FBI. With the NIBRS, all offenses in a single incident are recorded and can be analyzed. Some researchers have reported that the hierarchy rule has been completely suspended in the NIBRS, but this is incorrect. Two exceptions to the hierarchy rule remain. First, if a motor vehicle is stolen (motor vehicle theft), and there were items in the car (property theft), only the motor vehicle theft is reported. Second, in the event of a justifiable homicide two offenses are reported: (1) the felonious acts by the offenders and (2) the actual nonnegligent homicide. In the NIBRS, the hotel rule was modified as well. The hotel rule states that where there is a burglary in a dwelling or facility in which multiple units were burglarized (e.g., a hotel) and the police are most likely to be reported by the manager of the dwelling, the incident is counted as a single offense. In addition, the NIBRS has extended the hotel rule to self-storage warehouses, or mini-warehouses.

The traditional UCR summary reporting system is characterized by many advantages. First, it has been ongoing for more than eight decades with remarkably stable methodology. This aspect allows meaningful trend analysis. Second, the UCR allows analyses at a variety of levels of geography. One can ascertain crime information for cities, regions, or the nation. Third, this system offers broad crime coverage, ranging from vandalism to homicide. Fourth, instead of focusing only on street crimes (i.e., homicide, robbery, and assault), the UCR offers information on other crimes, such as embezzlement, drunkenness, and vagrancy. Fifth, the UCR summary system has broad coverage from law enforcement agencies. All 50 states, the District of Columbia, and some U.S. territories report data to the FBI. Sixth and last, the UCR collects crime information regardless of the age or victim or offender. Some crime data collection systems (e.g., the National Crime Victimization Survey) gather crime data on restricted ages only. The NIBRS enjoys many of the UCR’s advantages and more. The greatest additional advantage of the NIBRS is that it offers incident-level details for every crime reported. With greater detail, one can disaggregate data by multiple victim, offender, and incident characteristics. One also can link various components of the incident.

Both the traditional summary system and the NIBRS have limitations that are important to recognize. First, both systems reflect only crimes reported to the police. Evidence is clear that many crimes are reported to the police in low percentages. For example, only about half of all violent crime comes to the attention of the police. In some cases, such as rape, fewer than 30% of the crimes are reported to the police. Second, because the data come from law enforcement agencies, they can be manipulated for political and societal purposes. Although this is not considered to be a widespread problem, it can and has happened. Third, because the UCR reporting systems are voluntary, they are subject to a lack of, or incomplete, reporting by law enforcement agencies. When information is not submitted or the submitted information does not meet the FBI’s guidelines for completeness and accuracy, the FBI uses specific protocols to impute data to account for this issue. The degree to which UCR data are imputed at the national level is sizeable and varies annually.

The NIBRS is characterized by some disadvantages not shared with the traditional UCR system. First, the NIBRS has limited coverage. It requires a lengthy certification process, and scholars have suggested that a result of this is slow conversion to the system. As of 2007, 31 states were certified and contributing data to the program. This represents reporting by 37% of law enforcement agencies and coverage of approximately 25% of the U.S. population. Furthermore, not all agencies within certified states submit any NIBRS data. In 2004, only 7 states fully reported NIBRS data. The agencies that do participate tend to represent smaller population areas. As recently as 2005, no agency covering a population of over 1 million participated in the NIBRS. Given this, it is clear that the NIBRS does not utilize data that constitute a representative sample of the population, law enforcement agencies, or states.

The second major publisher of national crime reports and statistics is the BJS, the primary statistical agency in the Department of Justice. This bureau was established under the Justice Systems Improvement Act of 1979. Prior to this, the office was recognized as the National Criminal Justice Information and Statistics Service, which was a part of the Law Enforcement Agency within the Law Enforcement Assistance Administration. Currently, the BJS is an agency in the Office of Justice Programs within the Department of Justice. The mission of the BJS is to gather and analyze crime data; publish crime reports; and make available this information to the public, policymakers, the media, government officials, and researchers.

Although the BJS collects a wide variety of data related to all aspects of the criminal justice system, its major crime victimization data collection effort is currently the National Crime Victimization Survey (NCVS). The NCVS is the nation’s primary source of information about the frequency, characteristics, and consequences of victimization of individuals age 12 and older and their households in the United States. The survey was first fielded in 1972 as the National Crime Survey (NCS). The NCS was designed with three primary purposes. First, it was to serve as a benchmark for UCR statistics on crime reported to police. Second, the NCS was to measure what was called “the dark figure of unreported crime,” that is, crime unknown by law enforcement. Third, the NCS was designed to fill a perceived need for information on the characteristics of crime not provided by the UCR.

Shortly after the fielding of the NCS, work toward improving the survey began. Beginning in 1979, plans for a thorough redesign to improve the survey’s ability to measure victimization in general, and certain difficult-to-measure crimes, such as rape, sexual assault, and domestic violence, was started. The redesign was implemented in 1992 using a split-sample design. It is at this time that the NCS changed names to the NCVS. The first full year of NCVS data based on the redesign was available in 1993. Following the redesign, the NCVS measured almost the identical set of crimes gathered in the NCS. The only exception is that, after the redesign, data on sexual assault were collected.

In general, and as anticipated, the NCS redesign resulted in an increase in the number of crimes counted. Increases measured were not uniform across crime types, however. For example, increases in crimes not reported to the police were greater than the increases in crimes reported to the police. One reason for this is that improved cues for certain questions caused respondents to recall more of the less serious crimes—those that are also less likely to be reported to law enforcement officials. As a result, the percentage of crimes reported to police based on the redesigned survey is lower than the percentage calculated based on data collected with the previous survey design. This difference is particularly significant for crimes such as simple assault, which does not involve the presence of weapons or serious injury.

NCVS crime data come from surveys administered at a sample of households in the United States. Households are selected via a stratified, multistage, cluster sample. The samples are designed to be representative of households, as well as of noninstitutionalized individuals age 12 or older in the United States. The NCVS is characterized by a very large sample size. In recent years, approximately 80,000 persons in 40,000 households were interviewed. The NCVS is also characterized by a rotating-panel design in which persons are interviewed every 6 months for a total of seven interviews. Interviews are conducted in person and over the telephone throughout the year.

NCVS surveys are administered via two survey instruments. The first is a screening instrument that is used to gather information to determine whether a respondent was a victim of a threatened, attempted, or completed crime during the preceding 6 months. If the screening instrument uncovers a possible victimization, a second incidentfocused instrument is administered to gather detailed characteristics about all victimizations revealed. These details include the victim characteristics, offender characteristics, and characteristics of the incident.

The details gathered on the incident instrument are used in two very important ways. Detailed incident information is used to determine whether the incident described by the respondent was a crime and, if the incident is deemed a crime, the type of crime that occurred. These assessments are made not by the field representative or the survey respondent but by statisticians using incident details during data processing at the U.S. Census Bureau, the agency responsible for collecting the data on behalf of the BJS.

Because one of the major purposes of the NCVS was to serve as a benchmark for UCR summary program statistics on crime reported to police, and to measure the “dark figure” of unreported crime, the offenses measured by the NCVS are analogous to Part I crimes measured by the UCR. NCVS criminal offenses measured include rape, sexual assault, robbery, aggravated assault, simple assault, pocket-picking and purse-snatching, property theft, burglary, and motor vehicle theft.

The NCVS gathers far more than merely information on the types of personal and property crimes in the United States against persons age 12 or older. For each victimization revealed, extensive detailed information is collected. This includes the outcome of the victimization (completed, attempted); time and location of the incident; the number of victims, bystanders, and offenders; victim demographics; victim–offender relationship; offender demographics; offender drug and/or alcohol use; gang membership; presence of weapon(s); injuries sustained; medical attention received; police contact; reasons for or against contacting the police; police response; victim retaliation; value of retaliation; and so on.

Currently, the future of the NCVS is unclear. During 2007 and 2008, the Committee on National Statistics, in cooperation with the Committee on Law and Justice, reviewed the NCVS to consider options for conducting it. This need for review grew on the basis of evidence that the effectiveness of the NCVS has recently been undermined via the demands of conducting an expensive survey in a continued flatline budgetary environment. Given this situation, the BJS has implemented many cost savings strategies over time, including multiple sample cuts. Unfortunately, the result of sample cuts (in conjunction with falling crime rates) is that, for the last several years, the sample size is now such that only a year-to-year change of 8% or more can been deemed statistically different from no change at all. On the basis of the review, the panel concluded that the NCVS as it currently stands is not able to achieve its legislatively mandated goal of collecting and analyzing data. The review panel provided multiple recommendations regarding a redesign of the NCVS that are currently being studied. At this time, it is unclear what a redesign would entail, or even if a redesign will happen. One possibility—not embraced by the review panel—is the termination of the NCVS. Such an outcome would be unfortunate given that the survey provides the only nationally representative data on crime and victimization with extensive details on the victim, the offender, and the incident.

A major advantage of the NCVS is that it provides data on reported and unreported crimes. As stated previously, many crimes (and in some cases, e.g., rape, most crimes) are not reported to police. A second advantage of NCVS data is that they offer a wide range of criminal victimization variables, including information about crime victims (e.g., age, gender, race, Hispanic ethnic origin, marital status, income, and educational level), criminal offenders (e.g., gender, race, approximate age, drug/alcohol use, and victim–offender relationship), and the context of the crime (e.g., time and place of occurrence, use of weapons, nature of injury, and economic consequences). A third advantage of NCVS data is the high response rates. Like all surveys, response rates in the NCVS have declined a bit in recent years; nonetheless, they continue to be relatively high. For example, between 1993 and 1998, NCVS response rates varied between 93% and 96% of eligible households and between 89% and 92% of eligible individuals. A fourth advantage of NCVS data is that the survey has been ongoing for over three decades with a stable sample and methodology. This makes trend analysis possible, and it allows one to aggregate data in an effort to study relatively rare crimes, such as rape, or relatively small populations, such as American Indians.

The NCVS performs very well for the purposes designed; however, like all surveys, it has some limitations. First, the NCVS is designed to generate national estimates of victimization. Because of this, the data cannot be used to estimate crime at most other geographic levels, such as the state, county, or local level. In 1996, a region variable was added to the NCVS data, enabling crime estimates for the Northeast, South,West, andMidwest. On rare occasions, special releases of NCVS data have provided insight into crime in major cities. Limited age coverage is a second limitation of NCVS data. Because the data do not include victimizations of persons age 11 or younger, findings are not generalizable to this group. A third limitation is limited population coverage. Because one must live in a housing unit or group quarter to be eligible for the NCVS sample, persons who are crews of vessels, in institutions (e.g., prisons and nursing homes), members of the armed forces living in military barracks, or homeless are excluded from the NCVS sample and data. The fourth and final limitation is limited crime coverage. The NCVS collects data on the few personal and property crimes listed earlier and excludes many others. NCVS data tend to focus on street crimes, excluding other offenses, such as arson, crimes against businesses, stalking, vagrancy, homicide, embezzlement, and kidnapping. A limitation of the NCVS data stems from the fact that they are derived from a sample. Like all sample surveys, the NCVS is subject to sampling and nonsampling error. Although every effort is taken to reduce error, some remains. One source of nonsampling error stems from the inability of some respondents to recall in detail the crimes that occurred during the 6-month reference period. Some victims also may not report crimes committed by certain offenders (e.g., spouses), others may simply forget about victimizations experienced, and still others may experience violence on a frequent basis and may not view such incidents as important enough to report to an NCVS field representative. A final limitation is associated with what are referred to as series victimizations. Series victimizations are defined as six or more similar but separate victimizations that the victim is unable to recall individually or to describe in detail to an interviewer. Recall that crime classification in the NCVS is based on the respondent’s answers to several incident questions. Without information on each incident, crime classification cannot occur. To address series victimization, a specific protocol is used. This protocol states that if an individual was victimized six or more times in a similar fashion during the 6-month reference period, and he or she cannot provide the details about each incident, then one report is taken for the entire series of victimizations. Details of the most recent incident are obtained, and the victimization is counted as a singular incident. It is clear that the series protocol results in an underestimate of the actual rate of victimization.

Because of the similarities between the UCR and the NCVS, it is generally expected that each data source will provide the same story about crime in the United States. Although that does often happen, many times it does not. Since 1972, yearto- year violent crime change estimates from the NCVS and UCR moved in the same direction, either up or down, about 60% of the time. Property crime rates have moved in the same direction about 75% of the time. Given that the NCVS and the UCR have different purposes and different methodologies, study different populations, examine different types of crimes, and count offenses and calculate crime rates differently, a lack of congruence on occasion should not be surprising. This section of the research paper looks at some of the reason the two series do not always track together.

Perhaps the largest difference between the UCR and NCVS is that the UCR measures only crimes reported to law enforcement agencies; that is, if the crime was not reported to the police, that crime can never be reflected in UCR data. In contrast, the NCVS interviews victims of crime and collects information on crimes that were and were not reported to the police. A second major difference in the two systems is found in the population coverage. UCR data include all reported crimes regardless of victim characteristics. This includes crimes against young children, visitors from other countries, and businesses or organizations. In contrast, the NCVS provides data on reported and unreported crimes against people age 12 or older and their households. Not included in the NCVS data are crimes against persons younger than age 12, businesses, homeless people, and institutionalized persons.A third significant difference in the two systems is crime coverage. The Part I UCR summary reporting system includes homicide and arson, neither of which is measured by the NCVS. In contrast, the NCVS collects information on simple assault—the most frequent violent crime—whereas the UCR traditional Part I crimes excludes it. In addition, the NCVS and UCR define some crimes differently and count some crimes differently. As stated earlier, the UCR defines forcible rape as “the carnal knowledge of a female forcibly and against her will” and excludes rapes of males and other forms of sexual assault. The NCVS measures rape and sexual assault of both women and men.

Yet another significant difference concerns the basic counting unit of the two data collection systems. In the NCVS, the basic counting unit is the victim. There are two types of victims in the NCVS: (1) the person and (2) the household. When considering personal or violent crimes, (i.e., rape, sexual assault, robbery, assault, purse-snatching, or pocket-picking), the number of victimizations is equal to the number of persons victimized. When considering property crimes (i.e., property theft, household burglary, and motor vehicle theft), the number of victims is equal to the number of households victimized. Therefore, crime reports using NCVS data report rates of violent crime as the number of victimizations per 1,000 people age 12 or older. Likewise, property crimes are reported as the number of property victimizations per 1,000 households. In the UCR, the basic counting unit is the offense. For some crimes, such as assault and rape, an offense is equal to the number of victims. For other crimes, such as burglary or robbery, an offense is equal to the number of incidents. All UCR crime rates, regardless of the type of victim (i.e., individual or organization), are calculated on a per capita basis: the number of offenses per 100,000 people. For some crimes, the NCVS and UCR counting rules result in similar outcomes. For instance, if in a single incident two people were assaulted by a knifewielding offender both programs would count two aggravated assaults. In contrast, other times counting rules result in different outcomes. For example, if in a single incident five people were robbed by a gun-toting offender, the NCVS would record five robbery victimizations, and the UCR would count a single robbery. If, however, a bank teller was threatened by an armed assailant during a bank robbery, the UCR would record this as a robbery with a weapon, whereas the NCVS, which measures only crimes against people and their households, would classify the same crime as an aggravated assault victimization (assuming that no personal property was stolen from the teller).

This research paper has provided an overview of crime reports and statistics, which are used to convey an extensive amount of information about crime. This includes topics such as the extent of crime and the nature or characteristics of criminal offenses, as well as how the nature and characteristics of crime change over time. Furthermore, official crime reporting systems, such as the UCR, the NIBRS, and the NCVS, allow insight into the experiences of crime and victimization for specific groups and how they may or may not differ from others or over time. Understanding what crime reports and statistics are requires an understanding of the agencies that gather the data and publish the reports. Furthermore, one must comprehend the intricacies of the data collection to fully appreciate the strengths and weaknesses of what the data (and resulting reports and statistics) offer.

Data from the UCR and the NCVS are essential to an understanding of crime. Because crime is not a directly observable phenomenon, no single measure can adequately convey or describe information about its extent and characteristics. Like other nonobservable events, such as the economy or the weather, no single measure suffices. One could not hope to understand the state of the economy by understanding only the unemployment rate. Neither could one fully realize the condition of the weather by understanding the percentage humidity only. Multiple measures are required for such phenomenon. These multiple measures are found in UCR data and the NCVS. Together, used in a complementary fashion, these data provide a more complete understanding of crime in the nation than either could alone.

References:

Addington, L. A. (2007). Using NIBRS to study methodological sources of divergence between the UCR and NCVS. In J. P. Lynch & L. A. Addington (Eds.), Understanding crime statistics: Revisiting the divergence of the NCVS and the UCR (pp. 225–250). NewYork: Cambridge University Press.
Barnett-Ryan, C. (2007). Introduction to the Uniform Crime Reporting Program. In J. P. Lynch & L. A. Addington (Eds.), Understanding crime statistics: Revisiting the divergence of the NCVS and the UCR (pp. 55–92). New York: Cambridge University Press.
Biderman, A. D., & Reiss, A. J. (1967). On exploring the “dark figure” of crime. Annals of the American Academy of Political and Social Science, 374, 1–15.
Bureau of Justice Statistics. (1989). Redesign of the National Crime Survey (NCJ Publication No. 111457). Washington, DC: U.S. Department of Justice.
Federal Bureau of Investigation. (2004). UCR: Uniform Crime Reporting handbook. Washington, DC: U.S. Department of Justice.
Kindermann, D., Lynch, J., & Cantor, D. (1997). Effects of the redesign on victimization estimates (NCJ Publication No. 164381). Washington, DC: U.S. Department of Justice.
Lehnen, R. G., & Skogan, W. G. (Eds.). (1981). The National Crime Survey working papers: Vol. 1. Current and historical perspectives (NCJ Publication No. 75374). Washington, DC: U.S. Department of Justice.
Lehnen, R. G., & Skogan,W. G. (Eds.). (1984). The National Crime Survey working papers: Vol. 2. Methodological studies (NCJ Publication No. 90307).Washington, DC: U.S. Department of Justice.
Lynch, J. P., & Addington, L. A. (Eds.). (2007). Understanding crime statistics: Revisiting the divergence of the NCVS and the UCR. New York: Cambridge University Press.
Maltz, M. D. (1999). Bridging gaps in police crime data (NCJ Publication No. 176365).Washington, DC: U.S. Department of Justice.
National Research Council of the National Academies. (2008, February). Surveying victims: Options for conducting the National Crime Victimization Survey. Retrieved December 28, 2013, from http://books.nap.edu/openbook.php?record_id=12090&page=R1
President’s Commission on Law Enforcement and the Administration of Justice. (1967a). The challenge of crime in a free society. Washington, DC: U.S. Government Printing Office.
President’s Commission on Law Enforcement and the Administration of Justice. (1967b). Crime and its impact: An assessment. Washington, DC: U.S. Government Printing Office.
Rennison, C. M., & Rand, M. R. (2007). Introduction to the National Crime Victimization Survey. In J. P. Lynch & L. A. Addington (Eds.), Understanding crime statistics: Revisiting the divergence of the NCVS and the UCR (pp. 17–54). NewYork: Cambridge University Press.
U.S. Department of Justice. (1985). Blueprint for the future of the Uniform Crime Reporting Program.Washington, DC: Author.

An official website of the United States government, Department of Justice.

Here's how you know

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Using Research to Improve Hate Crime Reporting and Identification

Hate crimes harm whole communities. They are message crimes that tell all members of a group—not just the immediate victims—that they are unwelcome and at risk.

The damage that bias victimization causes multiplies when victims and justice agencies don’t recognize or report hate crimes as such. In addition, in cases for which law enforcement agencies fail to respond to or investigate hate crimes, relationships between law enforcement and affected communities can suffer, and public trust in police can erode. [1]

While it is known that hate crimes are underreported throughout the United States, there is not a clear understanding of exactly why reporting rates are low, to what extent, and what might be done to improve them. An even more elementary question, with no single answer, is: What constitutes a hate crime? Different state statutes and law enforcement agencies have different answers to that question, which further complicates the task of identifying hate crimes and harmonizing hate crime data collection and statistics.

See "Hate Crimes: A Distinct Category."

A recent series of evidence-based research initiatives supported by the National Institute of Justice (NIJ) is helping to narrow this critical knowledge gap and illuminate a better path forward. The study findings fill in vital details on causes of hate crime underreporting in various communities, including

hate crime victims’ reluctance to engage with law enforcement;
victims’ and law enforcement agencies’ inability to recognize certain victimizations as hate crimes;
a very large deficit of hate crime reporting by law enforcement agencies of all sizes; and
variations in hate crime definitions across jurisdictions.

Significant insights to emerge from those studies include the following: [2]

A growing number of members of the Latino community, particularly those who recently immigrated to the United States, reported experiencing bias victimization. (But Black communities endure more hate crimes than any other racial or ethnic group.)
Many Latino individuals, especially immigrants, tend to report bias victimization only to friends and family. They are often highly reluctant to share incidents with law enforcement or other authorities.
LGBTQ+ community members also reported an elevated rate of bias victimization. Some victims hesitate to report hate crimes to authorities out of fear of reprisals from law enforcement or because, among other reasons, they don’t want their sexual orientation or gender identity exposed.
Many hate crimes, particularly those targeting the LGBTQ+ community, are the product of mixed motivations—for example, hate and theft. This likely results from a perception that certain victim groups are vulnerable and less likely to report the crimes.
Law enforcement officers often lack the training and knowledge needed to investigate, identify, and report hate crimes. The presence of a dedicated officer or unit enhances a law enforcement agency’s ability to identify, respond to, and report hate crimes.
Law enforcement agencies with policies in place that support hate crime investigation and enforcement are more likely to report investigating possible hate crimes in their jurisdiction.

In the end, knowledge gained from the NIJ-supported research on bias victimization and hate crime can strengthen hate crime recognition, reporting, and response.

See "Hate Crime vs. Bias Victimization."

Hate Crime Reporting Deficit Driven by Fear, Lack of Knowledge

Federal data captures roughly 1 in 31 hate crimes.

The disparity between the number of hate crime victimizations that actually occur and the number reported by law enforcement is vast and long-standing. As hate crimes continue to rise in the United States, especially in vulnerable populations, the search for ways to reduce that disparity becomes more urgent.

A representative sample of hate crime victimizations across the United States, collected from the National Crime Victimization Survey, revealed that only a small portion of all hate crimes find their way into official hate crime reporting. [3] An annual average of 243,770 hate crime victimizations of persons 12 or older occurred between 2010 and 2019. [4] In the same period, law enforcement agencies reported an annual average of 7,830 hate crimes to the FBI’s Hate Crime Statistics program. Those figures suggest that roughly 1 of every 31 hate crimes is captured in U.S. federal statistics.

The FBI has published hate crime statistics provided by law enforcement since 1996. However, submitting hate crime data to the FBI is voluntary, and many state and local law enforcement agencies either report that their jurisdictions experience no hate crimes or do not report any hate crime data. [5]

Three Conditions for a Hate Crime to Enter National Statistics

The overall investigation and prosecution of hate crimes suffer from the prevalence of inaccurate hate crime data. The COVID-19 Hate Crimes Act of 2021 acknowledges that incomplete data from federal, state, and local jurisdictions have hindered our understanding of hate crimes. [6] Without a full, data-informed understanding of the problem, law enforcement and communities will be unable to provide an adequate response.

Three steps must occur for federal statistics to capture a hate crime incident.

A victim, a victim’s friend or family member, or another person with knowledge of the incident must report the incident to law enforcement.
Upon receiving an incident report, law enforcement must recognize and record it as a hate crime by establishing sufficient evidence through an investigation. [7]
The law enforcement agency must report the hate crime to the FBI’s Uniform Crime Reporting Program.

Reporting barriers are present at each step of the process, which results in chronic and acute underreporting of hate crimes.

Dealing With Divergent Hate Crime Definitions

Although the FBI has a definition for hate crimes, their definition only affects state data-reporting obligations. The FBI definition has no impact on states’ own criminal code definitions. State and local hate crime definitions vary widely in terms of whom they protect and the types of offenses they include. [8] The varying hate crime definitions make it challenging to obtain an even-handed and reliable summary of hate crime statistics across jurisdictions. When recording cases for the FBI, law enforcement agencies are required to adhere to the federal definition of offenses and protected groups. [9] An offense that constitutes a federal hate crime may not constitute a hate crime in a state or local jurisdiction; the reverse could also be true. As a result, hate crime counts based on jurisdiction-specific definitions are not always comparable to counts reported by the FBI.

Understanding Victim Reluctance to Report Hate Crimes

NIJ-sponsored research on hate crimes that affect Latino and LGBTQ+ communities suggests that many factors influence whether individuals who experience or witness hate crimes report them to law enforcement. Those factors vary across communities.

Researchers at Northeastern University, the University of Massachusetts Lowell, the University of Texas Medical Branch, and the University of Delaware conducted a study of victimization bias affecting three large, geographically diverse Latino populations. The study found that victims who experienced bias victimization overwhelmingly sought help from friends or family and not from formal authorities, particularly law enforcement. [10] The report rates to formal authorities by nonimmigrant and immigrant Latinos were similar, though nonimmigrant Latinos were more likely than immigrant Latinos to report experiencing bias victimization. It’s important to note that the Latino community is large and varied in the United States, and victimization bias varies by nature and degree across Latino communities. Many Latino study participants said that their past experiences as victims of personal or indirect discrimination have made them less willing to report their bias victimization to authorities or to trust those outside of their community.

Among Latino populations, several factors influenced their reluctance to contact law enforcement about hate crimes, including concern over retaliation by the offending party, harassment by police, and worries over the victim’s immigration status.

Florida International University conducted a study of LGBTQ+ Latinos in Miami, Florida, that established an additional factor that inhibited victim reporting of hate crimes to law enforcement: concern about the consequences of revealing their sexual orientation or gender identity. [11] The study also found that friends’ encouragement to report a crime was “by far” the strongest predictor of hate crime reporting, which increased the likelihood of the victim reporting the crime at least ten-fold.

Multiple Sources of Initial Hate Crime Reporting to Law Enforcement

Although it is vital for victims to report hate crimes, it is not the only way that law enforcement finds out about these types of crimes. The National Hate Crime Investigation Study found that, of all incidents reported to law enforcement in a nationally representative sample, victims reported 45 percent of those hate crimes and other individuals reported 52 percent. [12]

The Miami-based study reported that criminal justice practitioners perceived that law enforcement initiates most hate crime cases in response to media coverage of bias-motivated events rather than in response to victims’ reporting. 13

Low rates of formal reporting obscure the significant impact that hate crimes have on victims. Bias victimization can be just as, or more, damaging to Latino victims than other types of victimization, such as assault or theft. Among the three large metro Latino communities that the Latino bias research study examined, bias victimization had more of a mental health impact on community members than other forms of victimization. [14] In fact, bias victimization is unique in its negative impact on mental health. This has notable implications for both prevention and intervention within the community.

The Miami-Dade study of LGBTQ+ Latino individuals reported these other consequences of bias victimization (perhaps influenced by mental health impacts):

Victims began to avoid LGBTQ+ venues or friends (13 percent).
Victims had to change their housing (23 percent).
Victims tried to act more “straight” (35 percent).

The strongest predictor of a victim changing their housing was that the victim experienced a hate crime involving the use of a weapon.

Recognizing Hate Crime Incidents

It isn’t enough for a law enforcement agency to receive a report of a hate crime incident if the agency doesn’t recognize and report the incident as such. But identifying a bias motivation can be challenging. It’s not always clear what motivated a person to commit a crime, and other factors unrelated to bias may mask an incident’s hate-based status. Further, it’s not typical for law enforcement to be required to identify a motive.

Mixed-Motive Hate Offenses: Choosing Victims for Their Vulnerability

Bias motives can emerge during disputes or incidents that are unrelated to bias, which potentially complicates law enforcement’s ability to identify a motive. NIJ-sponsored research by the National Consortium for the Study of Terrorism and Responses to Terrorism (START), a research center at the University of Maryland, found that these mixed-motive hate offenses are common. [15] START developed a database known as BIAS (the Bias Incidents and Actors Study), which collected information from 960 adult individuals who committed hate crimes from 1990 to 2018.

BIAS found that almost a quarter of hate crimes targeting victims due to their sexual orientation or gender had mixed motives. Additionally, nearly all hate crimes that targeted persons because of their age or physical or mental disabilities had mixed motives, such as a combination of hate and theft. The researchers noted that, in those cases in particular, the crime likely results from the fact that the person committing it perceives that certain victim groups are vulnerable and less inclined to report incidents to authorities. The study also found that mixed-motive hate crimes were more likely to be spontaneous or otherwise unpredictable than crimes motivated only by bias.

Varying Hate-Based Forms of Messaging – How to Identify a Crime as Hate-Motivated

A University of New Hampshire research team identified the top four indicators of hate motivation that law enforcement identified and reported in the National Hate Crime Investigation Study (NHCIS):

Hate-related verbal comments (reported by victims in 51.83 percent of hate crimes in the NHCIS database)
Victim belief that they were targeted because of hate or bias (28.96 percent)
Hate-related written comments (24.75 percent)
Hate-related drawings or graffiti found at the crime scene (23.39 percent) [16]

Characteristics of Primary Suspects in Hate Crime Investigations

The NHCIS examined characteristics of suspects from a sample of 783 hate crime investigations in 2018 where law enforcement identified a suspect. [17] The primary suspects were white in nearly three-quarters (73.69 percent) of those cases. See Table 1 for a breakdown of characteristics of primary suspects.

Table Note: Percentages for primary suspect gender, race/ethnicity, and age presented for cases in which information was known. Unknown/missing data: gender=59, 7.27% (weighted); race/ethnicity=145, 18.41% (weighted); age=221, 29.9% (weighted).

Varying Traits of Those Who Commit Hate Crimes

The START database study, BIAS, found that the behaviors, experiences, and characteristics of those who commit hate crimes in the United States varied significantly.

Some offenders were fully engaged in the world of bigotry and hate when they committed a bias-based offense, while others were acting on bias themes that pervade U.S. communities.
Some committed crimes of opportunity, while others carefully premeditated their acts.
Some were susceptible to negative peer influences or were struggling with mental health issues or substance abuse. [18]

The study also established that the characteristics of persons who commit hate crimes also varied considerably, depending on the nature of the prejudice involved. For example:

Those who committed hate crimes based on their victims’ religious beliefs were often older, better educated, and had higher rates of military experience than those who committed hate crimes based on other motivations.
Those motivated by religious bias displayed high rates of mental health concerns and were most likely to plan or commit hate crimes.
Those motivated by bias based on sexual orientation, gender, or gender identity were often young, unmarried, and unemployed. They were also most likely to commit hate crimes with accomplices and while under the influence of drugs or alcohol.

Agency Reporting of Zero Hate Crimes

Data analyses from the NIJ-supported NHCIS showed that many U.S. law enforcement agencies, regardless of size, reported that they conducted no hate crime investigations within 2018. This is consistent with the FBI’s assessment from hate crime statistics provided by law enforcement agencies. According to the FBI, generally, around 85 percent of law enforcement agencies said that no hate crimes occurred in their jurisdiction. [19] That is good news if no hate crimes occurred, but it is problematic if hate crimes are occurring without being reported or investigated as such.

The hate crime investigations study authors noted that although over half of large agencies (100+ officers) reported no hate crimes investigations in 2018, several large agencies reported more than 50 hate crimes investigations that year. Based on an assessment of case summaries, the researchers concluded that better documentation increased the number of investigations. They also found that agency policies and procedures increased the number of hate crime investigations.

Implications and Recommendations: How Research Can Enhance Hate Crime Reporting, Investigations

NIJ-supported hate crime research identified several proposals to improve hate crime investigation and reporting. One promising area is for agencies to implement certain hate crime policies and practices. The NHCIS surveyed agencies on whether they had implemented five specific policies and practices. The study found four of the five were significantly related to an increased number of reported hate crime investigations, even when controlling for agency type and size: [20]

Assigning a dedicated officer or unit to investigate hate crimes.
Reviewing procedures for cases with possible hate or bias motivation.
Developing written policy guidelines for investigating hate crimes.
Conducting outreach to local groups on hate crimes.

Researchers found no significant differences in hate crime reporting rates between agencies that had provided officers with training on hate crime investigations, and those that had not; however, the study did not look at the nature and quality of the hate crime training the officers received. The NHCIS noted that officers with minimal training are often tasked with identifying hate crimes based on their state’s legal definition. The report also noted that bias-based crimes are often hard to classify, even with good training. More information is needed to determine the optimal type and focus of hate crime training. The Latino bias studies identified a need to both identify hate crimes and increase community education on hate crimes. The research team identified the following policy and process needs, among others, to improve identification and reporting:

Enhance police training about risks associated with bias victimization in Latino communities.
Increase education and awareness about bias victimization among Latino population groups.
Build support for community-based agencies to facilitate the formal process of helping victims and reporting hate crimes to law enforcement. [21]

The study of LGBTQ+ Latinos in Miami identified the following recommendations to improve how law enforcement agencies report and identify anti-LGBTQ hate crimes:

Establish a hate crime detection protocol for emergency dispatchers, patrol officers, police detectives, case screeners, and prosecutors.
Develop a specialized workforce to identify, tackle, and prevent hate crimes; the workforce should be composed of prosecutors, detectives, patrol officers, victims’ liaisons, emergency dispatchers, researchers, and community experts.
Create a dedicated support center for hate crime victims.
Recruit police officers and prosecutors from the LGBTQ+ community.
Develop formal policies to affirm and support transgender colleagues, victims, and witnesses.
Encourage cooperation by pursuing victim engagement alternatives to subpoenas. Train criminal justice practitioners to improve victim engagement and hate crime detection, evidence gathering, and case screening.
Engage in effective communication and awareness-building campaigns, such as initiatives to encourage victims to tell friends about the incident, as well as encouraging friends to persuade a victim to report the crime. [22]

It is critical for both communities and law enforcement to improve their methods of reporting and identifying hate crimes. Only then will they be able to prevent and respond to incidents and link victims to services they need. Doing so will also enable the field to develop a more comprehensive understanding of the scope and nature of the problem. The current gap between the number of hate crime victimizations and the number of hate crimes that law enforcement reported and investigated threatens the relationship between law enforcement and targeted communities. Chronic, widespread underreporting of hate crimes also greatly reduces the likelihood of justice for victims.

Findings from NIJ-supported research provide important insight into the causes of underreporting and under-identification of hate crimes. These studies also offer policy and practice recommendations to improve how law enforcement agencies report and identify hate crimes.

Sidebar: Hate Crimes: A Distinct Crime Category

The codification of hate crime laws began in the 1980s, as jurisdictions acted to redress the harm, beyond victim impact, that bias-based victimizations inflict on society. [23] In a 2022 solicitation for further hate crime research, NIJ noted,

Hate crimes are a distinct category of crime that have a broader effect than most other kinds of crime because the victims are not only the crime’s immediate target but also others in the targeted group. [24]

Hate crimes are traditional criminal offenses with an added element of bias motivation. They are not limited to crimes against persons; the crimes can target businesses, religious institutions, other organizations, and society at large. Additionally, hate crimes are not limited to one type of motivating prejudice. The FBI defines a hate crime as:

a criminal offense committed against a person or property which is motivated, in whole or in part, by the offender’s bias against race, religion, disability, ethnic or national origin group, or sexual orientation group. [25]

Hate crimes can be violent or nonviolent, but the acts must be recognized criminal offenses even if the bias element is set aside. Yet the wide net cast by hate crime laws has not resulted in high rates of hate crime prosecution or punishment. As noted in an NIJ-sponsored report on findings from the National Hate Crime Investigations Study, only 4 percent of hate crimes investigated by law enforcement resulted in someone being criminally charged. [26]

Return to the text.

Sidebar: Hate Crime vs. Bias Victimization

Hate crimes are a form of bias victimization. A criminal offense is a core element of every hate crime. However, not every bias victimization is a crime. In simple math terms, hate crimes are a subset of all bias victimizations.

The FBI’s Uniform Crime Reporting Program (UCR) defines a hate crime as a “committed criminal offense which is motivated, in whole or in part, by the offender’s bias(es)” against a

sexual orientation
gender identity27

State and local jurisdictions have their own hate crime statutes proscribing some or all of those or other types of bias. An act of bias victimization can be, but need not be, a criminal offense.

Return to text.

[note 1] International Association of Chiefs of Police, Responding to Hate Crimes: A Police Officer’s Guide to Investigation and Prevention (Washington, DC: U.S. Department of Justice, Bureau of Justice Administration, 1999).

[note 2] The five NIJ-supported hate crime study reports covered in this article are: Carlos A. Cuevas, et al., Understanding and Measuring Bias Victimization Against Latinos , October 2019, NCJ 253430; Carlos A. Cuevas, et al., Longitudinal Examination of Victimization Experiences of Latinos (LEVEL): Extending the Bias Victimization Study , August 2021, NCJ 30167; Michael A. Jensen, Elizabeth A. Yates, and Sheehan E. Kane, A Pathway Approach to the Study of Bias Crime Offenders , February 2021, NCJ 300114; Besiki Luka Kutateladze, Anti-LGBTQ Hate Crimes in Miami: Research Summary and Policy Recommendations , September 2021, NCJ 302239; Lisa M. Jones, Kimberly J. Mitchell, and Heather A. Turner, U.S. Hate Crime Investigation Rates and Characteristics: Findings from the National Hate Crime Investigations Study (NHCIS) , December 2021, NCJ 304531.

[note 3] Grace Kena and Alexandra Thompson, Hate Crime Victimization, 2005–2019 (Washington, DC: U.S. Department of Justice (DOJ), Bureau of Justice Statistics (BJS), 2021).

[note 4] Erica Smith, Hate Crime Recorded by Law Enforcement, 2010–2019 (Washington, DC: U.S. DOJ, BJS, 2021.

[note 5] Smith, Hate Crime Recorded by Law Enforcement .

[note 6] COVID-19 Hate Crimes Act, Pub. L. No. 117-17, 123 Stat. 2835 and 135 Stat. 265, 266, 267, 268, 269, 270, 271 and 272 (2021).

[note 7] Global Law Enforcement Support Section (GLESS) Crime and Law Enforcement Statistics Unit (CLESU), Hate Crime Data Collection Guidelines and Training Manual (Washington, DC: Federal Bureau of Investigation: Criminal Justice Information Division Uniform Crime Reporting Program, 2022).

[note 8] U.S. Department of Justice, “ Laws and Policies .”

[note 9] GLESS CLESU, Hate Crime Data Collection Guidelines and Training Manual.

[note 10] Cuevas, et al., Understanding and Measuring Bias Victimization Against Latinos ; Cuevas, et al., Longitudinal Examination of Victimization Experiences of Latinos (LEVEL) .

[note 11] Kutateladze, Anti-LGBTQ Hate Crimes in Miami .

[note 12] Jones, Mitchell, and Turner, U.S. Hate Crime Investigation Rates and Characteristics .

[note 13] The practitioners were prosecutors who handled hate crime cases in the Miami-Dade State Attorney’s Office, detectives from the Miami-Dade Police Department, one victim liaison from the police department, and one from the prosecutor’s office; Kutateladze, Anti-LGBTQ Hate Crimes in Miami .

[note 14] The three-community bias study survey sampled Latino community members generally, not limited to self-identified bias victims. Respondents reported on their own bias experiences. Overall, 52.9 percent of participants experienced some form of bias event in their lifetime.

[note 15] Jensen, Yates, and Kane, A Pathway Approach to the Study of Bias Crime Offende rs .

[note 16] Jones, Mitchell, and Turner, U.S. Hate Crime Investigation Rates and Characteristics .

[note 17] Jones, Mitchell, and Turner, U.S. Hate Crime Investigation Rates and Characteristics .

[note 18] Jensen, Yates, and Kane, A Pathway Approach to the Study of Bias Crime Offenders .

[note 19] Smith, Hate Crime Recorded by Law Enforcement .

[note 20] Jones, Mitchell, and Turner, U.S. Hate Crime Investigation Rates and Characteristics .

[note 21] Cuevas, et al., Understanding and Measuring Bias Victimization Against Latinos ; Cuevas, et al., Longitudinal Examination of Victimization Experiences of Latinos (LEVEL) .

[note 22] Kutateladze, Anti-LGBTQ Hate Crimes in Miami .

23 Ryken Grattet and Valerie Jenness, Making Hate a Crime: From Social Movements to Law Enforcement (New York, NY: Russel Sage Foundation, 2001).

[note 24] NIJ FY22 Research and Evaluation on Hate Crime (Washington, DC: U.S. Department of Justice, National Institute of Justice, 2022).

[note 25] FBI, “Defining a Hate Crime.”

[note 26] Lisa M. Jones, Kimberly J. Mitchell, and Heather A. Turner, U.S. Hate Crime Investigation Rates and Characteristics: Findings from the National Hate Crime Investigations Study (NHCIS) , December 2021, NCJ 304531.

[note 27] FBI Hate Crime Statistics Reports, UCR, “Definition of a Hate Crime.”

About the author

Kaitlyn Sill, Ph.D. is a Social Science Research Analyst at the National Institute of Justice. Paul A. Haskins, JD, is a contract Writer-Editor supporting the National Institute of Justice.

Cite this Article

Crime forecasting: a machine learning and computer vision approach to crime prediction and prevention

Neil Shah 1 ,
Nandish Bhagat 1 &
Manan Shah ORCID: orcid.org/0000-0002-8665-5010 2

Visual Computing for Industry, Biomedicine, and Art volume 4 , Article number: 9 ( 2021 ) Cite this article

70k Accesses

53 Citations

4 Altmetric

Metrics details

A crime is a deliberate act that can cause physical or psychological harm, as well as property damage or loss, and can lead to punishment by a state or other authority according to the severity of the crime. The number and forms of criminal activities are increasing at an alarming rate, forcing agencies to develop efficient methods to take preventive measures. In the current scenario of rapidly increasing crime, traditional crime-solving techniques are unable to deliver results, being slow paced and less efficient. Thus, if we can come up with ways to predict crime, in detail, before it occurs, or come up with a “machine” that can assist police officers, it would lift the burden of police and help in preventing crimes. To achieve this, we suggest including machine learning (ML) and computer vision algorithms and techniques. In this paper, we describe the results of certain cases where such approaches were used, and which motivated us to pursue further research in this field. The main reason for the change in crime detection and prevention lies in the before and after statistical observations of the authorities using such techniques. The sole purpose of this study is to determine how a combination of ML and computer vision can be used by law agencies or authorities to detect, prevent, and solve crimes at a much more accurate and faster rate. In summary, ML and computer vision techniques can bring about an evolution in law agencies.

Introduction

Computer vision is a branch of artificial intelligence that trains the computer to understand and comprehend the visual world, and by doing so, creates a sense of understanding of a machine’s surroundings [ 1 , 2 ]. It mainly analyzes data of the surroundings from a camera, and thus its applications are significant. It can be used for face recognition, number plate recognition, augmented and mixed realities, location determination, and identifying objects [ 3 ]. Research is currently being conducted on the formation of mathematical techniques to recover and make it possible for computers to comprehend 3D images. Obtaining the 3D visuals of an object helps us with object detection, pedestrian detection, face recognition, Eigenfaces active appearance and 3D shape models, personal photo collections, instance recognition, geometric alignment, large databases, location recognition, category recognition, bag of words, part-based models, recognition with segmentation, intelligent photo editing, context and scene understanding, and large image collection and learning, image searches, recognition databases, and test sets. These are only basic applications, and each category mentioned above can be further explored. In ref. [ 4 ], VLFeat is introduced, which is a library of computer vision algorithms that can be used to conduct fast prototyping in computer vision research, thus enabling a tool to obtain computer vision results much faster than anticipated. Considering face detection/human recognition [ 5 ], human posture can also be recognized. Thus, computer vision is extremely attractive for visualizing the world around us.

Machine learning (ML) is an application that provides a system with the ability to learn and improve automatically from past experiences without being explicitly programmed [ 6 , 7 , 8 ]. After viewing the data, an exact pattern or information cannot always be determined [ 9 , 10 , 11 ]. In such cases, ML is applied to interpret the exact pattern and information [ 12 , 13 ]. ML pushes forward the idea that, by providing a machine with access to the right data, the machine can learn and solve both complex mathematical problems and some specific problems [ 14 , 15 , 16 , 17 ]. In general, ML is categorized into two parts: (1) supervised ML and (2) unsupervised ML [ 18 , 19 ]. In supervised learning, the machine is trained on the basis of a predefined set of training examples, which facilitates its capability to obtain precise and accurate conclusions when new data are given [ 20 , 21 ]. In unsupervised learning, the machine is given a set of data, and it must find some common patterns and relationships between the data its own [ 22 , 23 ]. Neural networks, which are important tools used in supervised learning, have been studied since the 1980s [ 24 , 25 ]. In ref. [ 26 ], the author suggested that different aspects are needed to obtain an exit from nondeterministic polynomial (NP)-completeness, and architectural constraints are insufficient. However, in ref. [ 27 ], it was proved that NP-completeness problems can be extended to neural networks using sigmoid functions. Although such research has attempted to demonstrate the various aspects of new ML approaches, how accurate are the results [ 28 , 29 , 30 ]?

Although various crimes and their underlying nature seem to be unpredictable, how unforeseeable are they? In ref. [ 31 ], the authors pointed out that as society and the economy results in new types of crimes, the need for a prediction system has grown. In ref. [ 32 ], crime trends and prediction technology called Mahanolobis and a dynamic time wrapping technique are given, delivering the possibility of predicting crime and apprehending the actual culprit. As described in ref. [ 33 ], in 1998, the United States National Institute of Justice granted five grants for crime forecasting as an extension to crime mapping. Applications of crime forecasting are currently being used by law enforcement in the United States, the United Kingdom, the Netherlands, Germany, and Switzerland [ 34 ]. Nowadays, criminal intellect with the help of advances in technology is improving with each passing year. Consequently, it has become necessary for us to provide the police department and the government with the means of a new and powerful machine (a set of programs) that can help them in their process of solving crimes. The main aim of crime forecasting is to predict crimes before they occur, and thus, the importance of using crime forecasting methods is extremely clear. Furthermore, the prediction of crimes can sometimes be crucial because it may potentially save the life of a victim, prevent lifelong trauma, and avoid damage to private property. It may even be used to predict possible terrorist crimes and activities. Finally, if we implement predictive policing with a considerable level of accuracy, governments can apply other primary resources such as police manpower, detectives, and funds in other fields of crime solving, thereby curbing the problem of crime with double the power.

In this paper, we aim to make an impact by using both ML algorithms and computer vision methods to predict both the nature of a crime and possibly pinpoint a culprit. Beforehand, we questioned whether the nature of the crime was predictable. Although it might seem impossible from the outside, categorizing every aspect of a crime is quite possible. We have all heard that every criminal has a motive. That is, if we use motive as a judgment for the nature of a crime, we may be able to achieve a list of ways in which crimes can be categorized. Herein, we discuss a theory where we combine ML algorithms to act as a database for all recorded crimes in terms of category, along with providing visual knowledge of the surroundings through computer vision techniques, and using the knowledge of such data, we may predict a crime before it occurs.

Present technological used in crime detection and prediction

Crime forecasting refers to the basic process of predicting crimes before they occur. Tools are needed to predict a crime before it occurs. Currently, there are tools used by police to assist in specific tasks such as listening in on a suspect’s phone call or using a body cam to record some unusual illegal activity. Below we list some such tools to better understand where they might stand with additional technological assistance.

One good way of tracking phones is through the use of a stingray [ 35 ], which is a new frontier in police surveillance and can be used to pinpoint a cellphone location by mimicking cellphone towers and broadcasting the signals to trick cellphones within the vicinity to transmit their location and other information. An argument against the usage of stingrays in the United States is that it violates the fourth amendment. This technology is used in 23 states and in the district of Columbia. In ref. [ 36 ], the authors provide insight on how this is more than just a surveillance system, raising concerns about privacy violations. In addition, the Federal Communicatons Commission became involved and ultimately urged the manufacturer to meet two conditions in exchange for a grant: (1) “The marketing and sale of these devices shall be limited to federal, state, local public safety and law enforcement officials only” and (2) “State and local law enforcement agencies must advance coordinate with the FBI the acquisition and use of the equipment authorized under this authorization.” Although its use is worthwhile, its implementation remains extremely controversial.

A very popular method that has been in practice since the inception of surveillance is “the stakeout”. A stakeout is the most frequently practiced surveillance technique among police officers and is used to gather information on all types of suspects. In ref. [ 37 ], the authors discuss the importance of a stakeout by stating that police officers witness an extensive range of events about which they are required to write a report. Such criminal acts are observed during stakeouts or patrols; observations of weapons, drugs, and other evidence during house searches; and descriptions of their own behavior and that of the suspect during arrest. Stakeouts are extremely useful, and are considered 100% reliable, with the police themselves observing the notable proceedings. However, are they actually 100% accurate? All officers are humans, and all humans are subject to fatigue. The major objective of a stakeout is to observe wrongful activities. Is there a tool that can substitute its use? We will discuss this point herein.

Another way to conduct surveillance is by using drones, which help in various fields such as mapping cities, chasing suspects, investigating crime scenes and accidents, traffic management and flow, and search and rescue after a disaster. In ref. [ 38 ], legal issues regarding the use of drones and airspace distribution problems are described. Legal issues include the privacy concerns raised by the public, with the police gaining increasing power and authority. Airspace distribution raises concerns about how high a drone is allowed to go.

Other surveillance methods include face recognition, license plate recognition, and body cams. In ref. [ 39 ], the authors indicated that facial recognition can be used to obtain the profile of suspects and analyze it from different databases to obtain more information. Similarly, a license plate reader can be used to access data about a car possibly involved in a crime. They may even use body cams to see more than what the human eye can see, meaning that the reader observes everything a police officer sees and records it. Normally, when we see an object, we cannot recollect the complete image of it. In ref. [ 40 ], the impact of body cams was studied in terms of officer misconduct and domestic violence when the police are making an arrest. Body cams are thus being worn by patrol officers. In ref. [ 41 ], the authors also mentioned how protection against wrongful police practices is provided. However, the use of body cams does not stop here, as other primary reasons for having a body camera on at all times is to record the happenings in front of the wearer in hopes of record useful events during daily activities or during important operations.

Although each of these methods is effective, one point they share in common is that they all work individually, and while the police can use any of these approaches individually or concurrently, having a machine that is able to incorporate the positive aspects of all of these technologies would be highly beneficial.

ML techniques used in crime prediction

In ref. [ 42 ], a comparative study was carried out between violent crime patterns from the Communities and Crime Unnormalized Dataset versus actual crime statistical data using the open source data mining software Waikato Environment for Knowledge Analysis (WEKA). Three algorithms, namely, linear regression, additive regression, and decision stump, were implemented using the same finite set of features on communities and actual crime datasets. Test samples were randomly selected. The linear regression algorithm could handle randomness to a certain extent in the test samples and thus proved to be the best among all three selected algorithms. The scope of the project was to prove the efficiency and accuracy of ML algorithms in predicting violent crime patterns and other applications, such as determining criminal hotspots, creating criminal profiles, and learning criminal trends.

When considering WEKA [ 43 ], the integration of a new graphical interface called Knowledge Flow is possible, which can be used as a substitute for Internet Explorer. IT provides a more concentrated view of data mining in association with the process orientation, in which individual learning components (represented by java beans) are used graphically to show a certain flow of information. The authors then describe another graphical interface called an experimenter, which as the name suggests, is designed to compare the performance of multiple learning schemes on multiple data sets.

In ref. [ 34 ], the potential of applying a predictive analysis of crime forecasting in an urban context is studied. Three types of crime, namely, home burglary, street robbery, and battery, were aggregated into grids of 200 m × 250 m and retrospectively analyzed. Based on the crime data of the previous 3 years, an ensemble model was applied to synthesize the results of logistic regression and neural network models in order to obtain fortnightly and monthly predictions for the year 2014. The predictions were evaluated based on the direct hit rate, precision, and prediction index. The results of the fortnightly predictions indicate that by applying a predictive analysis methodology to the data, it is possible to obtain accurate predictions. They concluded that the results can be improved remarkably by comparing the fortnightly predictions with the monthly predictions with a separation between day and night.

In ref. [ 44 ], crime predictions were investigated based on ML. Crime data of the last 15 years in Vancouver (Canada) were analyzed for prediction. This machine-learning-based crime analysis involves the collection of data, data classification, identification of patterns, prediction, and visualization. K-nearest neighbor (KNN) and boosted decision tree algorithms were also implemented to analyze the crime dataset. In their study, a total of 560,000 crime datasets between 2003 and 2018 were analyzed, and crime prediction with an accuracy of between 39% and 44% was obtained by predicting the crime using ML algorithms. The accuracy was low as a prediction model, but the authors concluded that the accuracy can be increased or improved by tuning both the algorithms and crime data for specific applications.

In ref. [ 45 ], a ML approach is presented for the prediction of crime-related statistics in Philadelphia, United States. The problem was divided into three parts: determining whether the crime occurs, occurrence of crime and most likely crime. Algorithms such as logistic regression, KNN, ordinal regression, and tree methods were used to train the datasets to obtain detailed quantitative crime predictions with greater significance. They also presented a map for crime prediction with different crime categories in different areas of Philadelphia for a particular time period with different colors indicating each type of crime. Different types of crimes ranging from assaults to cyber fraud were included to match the general pattern of crime in Philadelphia for a particular interval of time. Their algorithm was able to predict whether a crime will occur with an astonishing 69% accuracy, as well as the number of crimes ranging from 1 to 32 with 47% accuracy.

In ref. [ 46 ], the authors analyzed a dataset consisting of several crimes and predicted the type of crime that may occur in the near future depending on various conditions. ML and data science techniques were used for crime prediction in a crime dataset from Chicago, United States. The crime dataset consists of information such as the crime location description, type of crime, date, time, and precise location coordinates. Different combinations of models, such as KNN classification, logistic regression, decision trees, random forest, a support vector machine (SVM), and Bayesian methods were tested, and the most accurate model was used for training. The KNN classification proved to be the best with an accuracy of approximately 0.787. They also used different graphs that helped in understanding the various characteristics of the crime dataset of Chicago. The main purpose of this paper is to provide an idea of how ML can be used by law enforcement agencies to predict, detect, and solve crime at a much better rate, which results in a reduction in crime.

In ref. [ 47 ], a graphical user interface-based prediction of crime rates using a ML approach is presented. The main focus of this study was to investigate machine-learning-based techniques with the best accuracy in predicting crime rates and explore its applicability with particular importance to the dataset. Supervised ML techniques were used to analyze the dataset to carry out data validation, data cleaning, and data visualization on the given dataset. The results of the different supervised ML algorithms were compared to predict the results. The proposed system consists of data collection, data preprocessing, construction of a predictive model, dataset training, dataset testing, and a comparison of algorithms, as shown in Fig. 1 . The aim of this study is to prove the effectiveness and accuracy of a ML algorithm for predicting violent crimes.

Dataflow diagram

In ref. [ 48 ], a feature-level data fusion method based on a deep neural network (DNN) is proposed to accurately predict crime occurrence by efficiently fusing multi-model data from several domains with environmental context information. The dataset consists of data from an online database of crime statistics from Chicago, demographic and meteorological data, and images. Crime prediction methods utilize several ML techniques, including a regression analysis, kernel density estimation (KDE), and SVM. Their approach mainly consisted of three phases: collection of data, analysis of the relationship between crime incidents and collected data using a statistical approach, and lastly, accurate prediction of crime occurrences. The DNN model consists of spatial features, temporal features, and environmental context. The SVM and KDE models had accuracies of 67.01% and 66.33%, respectively, whereas the proposed DNN model had an astonishing accuracy of 84.25%. The experimental results showed that the proposed DNN model was more accurate in predicting crime occurrences than the other prediction models.

In ref. [ 49 ], the authors mainly focused on the analysis and design of ML algorithms to reduce crime rates in India. ML techniques were applied to a large set of data to determine the pattern relations between them. The research was mainly based on providing a prediction of crime that might occur based on the occurrence of previous crime locations, as shown in Fig. 2 . Techniques such as Bayesian neural networks, the Levenberg Marquardt algorithm, and a scaled algorithm were used to analyze and interpret the data, among which the scaled algorithm gave the best result in comparison with the other two techniques. A statistical analysis based on the correlation, analysis of variance, and graphs proved that with the help of the scaled algorithm, the crime rate can be reduced by 78%, implying an accuracy of 0.78.

Functionality of proposed approach

In ref. [ 50 ], a system is proposed that predicts crime by analyzing a dataset containing records of previously committed crimes and their patterns. The proposed system works mainly on two ML algorithms: a decision tree and KNN. Techniques such as the random forest algorithm and Adaptive Boosting were used to increase the accuracy of the prediction model. To obtain better results for the model, the crimes were divided into frequent and rare classes. The frequent class consisted of the most frequent crimes, whereas the rare class consisted of the least frequent crimes. The proposed system was fed with criminal activity data for a 12-year period in San Francisco, United States. Using undersampling and oversampling methods along with the random forest algorithm, the accuracy was surprisingly increased to 99.16%.

In ref. [ 51 ], a detailed study on crime classification and prediction using ML and deep learning architectures is presented. Certain ML methodologies, such as random forest, naïve Bayes, and an SVM have been used in the literature to predict the number of crimes and hotspot prediction. Deep learning is a ML approach that can overcome the limitations of some machine-learning methodologies by extracting the features from the raw data. This paper presents three fundamental deep learning configurations for crime prediction: (1) spatial and temporal patterns, (2) temporal and spatial patterns, and (3) spatial and temporal patterns in parallel. Moreover, the proposed model was compared with 10 state-of-the-art algorithms on 5 different crime prediction datasets with more than 10 years of crime data.

In ref. [ 52 ], a big data and ML technique for behavior analysis and crime prediction is presented. This paper discusses the tracking of information using big data, different data collection approaches, and the last phase of crime prediction using ML techniques based on data collection and analysis. A predictive analysis was conducted through ML using RapidMiner by processing historical crime patterns. The research was mainly conducted in four phases: data collection, data preparation, data analysis, and data visualization. It was concluded that big data is a suitable framework for analyzing crime data because it can provide a high throughput and fault tolerance, analyze extremely large datasets, and generate reliable results, whereas the ML based naïve Bayes algorithm can achieve better predictions using the available datasets.

In ref. [ 53 ], various data mining and ML technologies used in criminal investigations are demonstrated. The contribution of this study is highlighting the methodologies used in crime data analytics. Various ML methods, such as a KNN, SVM, naïve Bayes, and clustering, were used for the classification, understanding, and analysis of datasets based on predefined conditions. By understanding and analyzing the data available in the crime record, the type of crime and the hotspot of future criminal activities can be determined. The proposed model was designed to perform various operations such as feature selection, clustering, analysis, prediction, and evaluation of the given datasets. This research proves the necessity of ML techniques for predicting and analyzing criminal activities.

In ref. [ 54 ], the authors incorporated the concept of a grid-based crime prediction model and established a range of spatial-temporal features based on 84 types of geographic locations for a city in Taiwan. The concept uses ML algorithms to learn the patterns and predict crime for the following month for each grid. Among the many ML methods applied, the best model was found to be a DNN. The main contribution of this study is the use of the most recent ML techniques, including the concept of feature learning. In addition, the testing of crime displacement also showed that the proposed model design outperformed the baseline.

In ref. [ 55 ], the authors considered the development of a crime prediction model using the decision tree (J48) algorithm. When applied in the context of law enforcement and intelligence analysis, J48 holds the promise of mollifying crime rates and is considered the most efficient ML algorithm for the prediction of crime data in the related literature. The J48 classifier was developed using the WEKA tool kit and later trained on a preprocessed crime dataset. The experimental results of the J48 algorithm predicted the unknown category of crime data with an accuracy of 94.25287%. With such high accuracy, it is fair to count on the system for future crime predictions.

Comparative study of different forecasting methods

First, in refs. [ 56 , 57 ], the authors predicted crime using the KNNs algorithm in the years 2014 and 2013, respectively. Sun et al. [ 56 ] proved that a higher crime prediction accuracy can be obtained by combining the grey correlation analysis based on new weighted KNN (GBWKNN) filling algorithm with the KNN classification algorithm. Using the proposed algorithm, we were able to obtain an accuracy of approximately 67%. By contrast, Shojaee et al. [ 57 ] divided crime data into two parts, namely, critical and non-critical, and applied a simple KNN algorithm. They achieved an astonishing accuracy of approximately 87%.

Second, in refs. [ 58 , 59 ], crime is predicted using a decision tree algorithm for the years 2015 and 2013, respectively. In their study, Obuandike et al. [ 58 ] used the ZeroR algorithm along with a decision tree but failed to achieve an accuracy of above 60%. In addition, Iqbal et al. [ 59 ] achieved a stunning accuracy of 84% using a decision tree algorithm. In both cases, however, a small change in the data could lead to a large change in the structure.

Third, in refs. [ 60 , 61 ], a novel crime detection technique called naïve Bayes was implemented for crime prediction and analysis. Jangra and Kalsi [ 60 ] achieved an astounding crime prediction accuracy of 87%, but could not apply their approach to datasets with a large number of features. By contrast, Wibowo and Oesman [ 61 ] achieved an accuracy of only 66% in predicting crimes and failed to consider the computational speed, robustness, and scalability.

Below, we summarize the above comparison and add other models to further illustrate this comparative study and the accuracy of some frequently used models (Table 1 ).

Computer vision models combined with machine and deep learning techniques

In ref. [ 66 ], the study focused on three main questions. First, the authors question whether computer vision algorithms actually work. They stated that the accuracy of the prediction is 90% over fewer complex datasets, but the accuracy drops to 60% over complex datasets. Another concern we need to focus on is reducing the storage and computational costs. Second, they question whether it is effective for policing. They determined that a distinct activity detection is difficult, and pinpointed a key component, the Public Safety Visual Analytics Workstation, which includes many capabilities ranging from detection and localization of objects in camera feeds to labeling actions and events associated with training data, and allowing query-based searches for specific events in videos. By doing so, they aim to view every event as a computer-vision trained, recognized, and labeled event. The third and final question they ask is whether computer vision impacts the criminal justice system. The answer to this from their end is quite optimistic to say the least, although they wish to implement computer vision alone, which we suspect is unsatisfactory.

In ref. [ 67 ], a framework for multi-camera video surveillance is presented. The framework is designed so efficiently that it performs all three major activities of a typical police “stake-out”, i.e., detection, representation, and recognition. The detection part mixes video streams from multiple cameras to efficiently and reliably extract motion trajectories from videos. The representation helps in concluding the raw trajectory data to construct hierarchical, invariant, and content-rich descriptions of motion events. Finally, the recognition part deals with event classification (such as robbery and possibly murder and molestation, among others) and identification of the data descriptors. For an effective recognition, they developed a sequence-alignment kernel function to perform sequence data learning to identify suspicious/possible crime events.

In ref. [ 68 ], a method is suggested for identifying people for surveillance with the help of a new feature called soft biometry, which includes a person’s height, built, skin tone, shirt and trouser color, motion pattern, and trajectory history to identify and track passengers, which further helps in predicting crime activities. They have gone further and discussed some absurd human error incidents that have resulted in the perpetrators getting away. They also conducted experiments, the results of which were quite astounding. In one case, the camera catches people giving piggyback rides in more than one frame of a single shot video. The second scenario shows the camera’s ability to distinguish between airport guards and passengers.

In ref. [ 69 ], the authors discussed automated visual surveillance in a realistic scenario and used Knight, which is a multiple camera surveillance and monitoring system. Their major targets were to analyze the detection, tracking, and classification performances. The detection, tracking, and classification accuracies were 97.4%, 96.7%, and 88%, respectively. The authors also pointed to the major difficulties of illumination changes, camouflage, uninteresting moving objects, and shadows. This research again proves the reliability of computer vision models.

It is well known that an ideal scenario for a camera to achieve a perfect resolution is not possible. In ref. [ 70 ], security surveillance systems often produce poor-quality video, which could be a hurdle in gathering forensic evidence. They examined the ability of subjects to identify targeted individuals captured by a commercially available video security device. In the first experiment, subjects personally familiar with the targets performed extremely well at identifying them, whereas subjects unfamiliar with the targets performed quite poorly. Although these results might not seem to be very conclusive and efficient, police officers with experience in forensic identification performed as poorly as other subjects unfamiliar with the targets. In the second experiment, they asked how familiar subjects could perform so well, and then used the same video device edited clips to obscure the head, body, or gait of the targets. Hiding the body or gait produced a small decrease in recognition performance. Hiding the target heads had a dramatic effect on the subject’s ability to recognize the targets. This indicates that even if the quality of the video is low, the head the target was seen and recognized.

In ref. [ 71 ], an automatic number plate recognition (ANPR) model is proposed. The authors described it as an “image processing innovation”. The ANPR system consists of the following steps: (1) vehicle image capture, (2) preprocessing, (3) number plate extraction, (4) character segmentation, and (5) character recognition. Before the main image processing, a pre-processing of the captured image is conducted, which includes converting the red, green and blue image into a gray image, clamor evacuation, and border enhancement for brightness. The plate is then separated by judging its size. In character segmentation, the letters and numbers are separated and viewed individually. In character recognition, optical character recognition is applied to a given database.

Although real-time crime forecasting is vital, it is extremely difficult to achieve in practice. No known physical models provide a reasonable approximation with dependable results for such a complex system. In ref. [ 72 ], the authors adapted a spatial temporal residual network to well-represented data to predict the distribution of crime in Los Angeles at an hourly scale in neighborhood-sized parcels. These experiments were compared with several existing approaches for prediction, demonstrating the superiority of the proposed model in terms of accuracy. They compared their deep learning approach to ARIMA, KNN, and the historical average. In addition, they presented a ternarization technique to address the concerns of resource consumption for deployment in the real world.

In ref. [ 73 ], the authors conducted a significant study on crime prediction and showed the importance of non-crime data. The major objective of this research was taking advantage of DNNs to achieve crime prediction in a fine-grain city partition. They made predictions using Chicago and Portland crime data, which were further augmented with additional datasets covering the weather, census data, and public transportation. In the paper they split each city into grid cells (beats for Chicago and square grid for Portland). The crime numbers are broken into 10 bins, and their model predicts the most likely bin for each spatial region at a daily level. They train these data using increasingly complex neural network structures, including variations that are suited to the spatial and temporal aspects of the crime prediction problem. Using their model, they were able to predict the correct bin for the overall number of crimes with an accuracy of 75.6% for Chicago and 65.3% for Portland. They showed that adding the value of additional non-crime data was an important factor. They found that days with higher amounts of precipitation and snow decreased the accuracy of the model slightly. Then, considering the impact of transportation, the bus routes and train routes were presented within their beats, and it was shown that the beat containing a train station is on average 1.2% higher than its neighboring beats. The accuracy of a beat that contained one or more train lines passing through it was 0.5% more accurate than its neighboring beats.

In ref. [ 74 ], the authors taught a system how to monitor traffic and identify vehicles at night. They used the bright spots of the headlights and tail lights to identify an object first as a vehicle, and the bright light is extracted with a segmentation process, and then processed by a spatial clustering and tracking procedure that locates and analyzes the spatial and temporal features of the vehicle light. They also conducted an experiment in which, for a span of 20 min, the detection scores for cars and bikes were 98.79% and 96.84%, respectively. In another part of the test, they conducted the same test under the same conditions for 50 min, and the detection scores for cars and bikes were 97.58% and 98.48%, respectively. It is good for machines to be built at such a beginning level. This technology can also be used to conduct surveillance at night.

In ref. [ 75 ], an important approach for human motion analysis is discussed. The author mentions that human motion analysis is difficult because appearances are extremely variable, and thus stresses that focusing on marker-less vision-based human motion analysis has the potential to provide a non-obtrusive solution for the evaluation of body poses. The author claims that this technology can have vast applications such as surveillance, human-computer interaction, and automatic annotation, and will thus benefit from a robust solution. In this paper, the characteristics of human motion analysis are discussed. We divide the analysis part into two aspects, modeling and an estimation phase. The modeling phase includes the construction of the likelihood function [including the camera model, image descriptors, human body model and matching function, and (physical) constraints], and the estimation phase is concerned with finding the most likely pose given the likelihood (function result) of the surface. We discuss the model-free approaches separately.

In ref. [ 76 ], the authors provided insight into how we can achieve crime mapping using satellites. The need for manual data collection for mapping is costly and time consuming. By contrast, satellite imagery is becoming a great alternative. In this paper, they attempted to investigate the use of deep learning to predict crime rates directly from raw satellite imagery. They trained a deep convolutional neural network (CNN) on satellite images obtained from over 1 million crime-incident reports (15 years of data) collected by the Chicago Police Department. The best performing model predicted crime rates from raw satellite imagery with an astounding accuracy of 79%. To make their research more thorough, they conducted a test for reusability, and used the tested and learned Chicago models for prediction in the cities of Denver and San Francisco. Compared to maps made from years of data collected by the corresponding police departments, their maps have an accuracy of 72% and 70%, respectively. They concluded the following: (1) Visual features contained in satellite imagery can be successfully used as a proxy indicator of crime rates; (2) ConvNets are capable of learning models for crime rate prediction from satellite imagery; (3) Once deep models are used and learned, they can be reused across different cities.

In ref. [ 77 ], the authors suggested an extremely intriguing research approach in which they claim to prove that looking beyond what is visible is to infer meaning to what is viewed from an image. They even conducted an interesting study on determining where a McDonalds could be located simply from photographs, and provided the possibility of predicting crime. They compared the human accuracy on this task, which was 59.6%, and the accuracy of using gradient-based features, which was 72.5%, with a chance performance (a chance performance is what you would obtain if you performed at random) of only 50%. This indicates the presence of some visual cues that are not easily spotted by an average human, but are able to be spotted by a machine, thus enables us to judge whether an area is safe. The authors indicated that numerous factors are often associated with our intuition, which we use to avoid certain areas because they may seem “shady” or “unsafe”.

In ref. [ 78 ], the authors describe in two parts how close we are to achieving a fully automated surveillance system. The first part views the possibility of surveillance in a real-world scenario where the installation of systems and maintenance of systems are in question. The second part considers the implementation of computer vision models and algorithms for behavior modeling and event detection. They concluded that the complete scenario is under discussion, and therefore many people are conducting research and obtaining results. However, as we look closely, we can see that reliable results are possible only in certain aspects, while other areas are still in the development process, such as obtaining information on cars and their owners as well as accurately understanding the behavior of a possible suspect.

Many times during criminal activities, convicts use hand gestures to signal messages to each other. In ref. [ 79 ], research on hand gesture recognition was conducted using computer vision models. Their application architecture is of extremely high quality and is easy to understand. They begin by capturing images, and then try detecting a hand in the background. They apply either computer aided manufacturing or different procedure in which they first convert a picture into gray scale, after which they set the image return on investment, and then find and extract the biggest contour. They then determine the convex hull of the contour to try and find an orientation around the bounded rectangle, and finally interpret the gesture and convert it into a meaningful command.

Crime hotspots or areas with high crime intensity are places where the future possibility of a crime exists along with the possibility of spotting a criminal. In ref. [ 80 ], the authors conducted research on forecasting crime hotspots. They used Google Tensor Flow to implement their model and evaluated three options for the recurrent neural network (RNN) architecture: accuracy, precision, and recall. The focus is on achieving a larger value to prove that the approach has a better performance. The gated recurrent unit (GRU) and long short-term memory (LSTM) versions obtained similar performance levels with an accuracy of 81.5%, precision of 86%–87%, recall of 75%, and F1-score of 0.8. Both perform much better than the traditional RNN version. Based on the area under the ROC curve (AUC) performance observations, the GRU version was 2% better than the RNN version. The LSTM version achieved the best AUC score, which was improved by 3% over the GRU version.

In ref. [ 81 ], a spatiotemporal crime network (STCN) is proposed that applies a CNN for predicting crime before it occurs. The authors evaluated the STCN using 311 felony datasets from New York from 2010 to 2015. The results were extremely impressive, with the STCN achieving an F1-score of 88% and an AUC of 92%, which confirmed that it exceeded the performance of the four baselines. Their proposed model achieved the best performance in terms of both F1 and AUC, which remained better than those of the other baselines even when the time window reached 100. This study provides evidence that the system can function well even in a metropolitan area.

Proposed idea

After finding and understanding various distinct methods used by the police for surveillance purposes, we determined the importance of each method. Each surveillance method can perform well on its own and produce satisfactory results, although for only one specific characteristic, that is, if we use a Sting Ray, it can help us only when the suspect is using a phone, which should be switched on. Thus, it is only useful when the information regarding the stake out location is correct. Based on this information, we can see how the ever-evolving technology has yet again produced a smart way to conduct surveillance. The introduction of deep learning, ML, and computer vision techniques has provided us with a new perspective on ways to conduct surveillance. This is an intelligent approach to surveillance because it tries to mimic a human approach, but it does so 24 h a day, 365 days a year, and once it has been taught how to do things it does them in the same manner repeatedly.

Although we have discussed the aspects that ML and computer vision can achieve, but what are these aspects essentially? This brings us to the main point of our paper discussion, i.e., our proposed idea, which is to combine the point aspects of Sting Ray, body cams, facial recognition, number plate recognition, and stakeouts. New features iclude core analytics, neural networks, heuristic engines, recursion processors, Bayesian networks, data acquisition, cryptographic algorithms, document processors, computational linguistics, voiceprint identification, natural language processing, gait analysis, biometric recognition, pattern mining, intel interpretation, threat detection, threat classification. The new features are completely computer dependent and hence require human interaction for development; however, once developed, it functions without human interaction and frees humans for other tasks. Let us understand the use of each function.

Core analytics: This includes having knowledge of a variety of statistical techniques, and by using this knowledge, predict future outcomes, which in our case are anything from behavioral instincts to looting a store in the near future.

Neural networks: This is a concept consisting of a large number of algorithms that help in finding the relation between data by acting similar to a human brain, mimicking biological nerve cells and hence trying to think on its own, thus understanding or even predicting a crime scene.

Heuristic engines: These are engines with data regarding antiviruses, and thus knowledge about viruses, increasing the safety of our system as it identifies the type of threat and eliminates it using known antiviruses.

Cryptographic algorithms: Such algorithms are used in two parts. First, they privately encode the known confidential criminal data. Second, they are used to keep the newly discovered potential crime data encrypted.

Recursion processors: These are used to apply the functions of our machine repeatedly to make sure they continuously work and never break the surveillance of the machine.

Bayesian networks: These are probabilistic acyclic graphical models that can be used for a variety of purposes such as prediction, anomaly detection, diagnostics, automated insight, reasoning, time series prediction, and decision making under uncertainty.

Data acquisition: This might be the most important part because our system has to possess the knowledge of previous crimes and learn from them to predict future possible criminal events.

Document processors: These are used after the data collection, primarily for going through, organizing, analyzing, and learning from the data.

Computer linguistics: Using algorithms and learning models, this method is attempting to give a computer the ability to understand human spoken language, which would be ground breaking, allowing a machine to not only identify a human but also understands what the human is saying.

Natural language processor: This is also used by computers to better understand human linguistics.

Voice print identification: This is an interesting application, which tries to distinguish one person’s voice from another, making it even more recognizable and identifiable. It identifies a target with the help of certain characteristics, such as the configuration of the speaker’s mouth and throat, which can be expressed as a mathematical formula.

Gait analysis: This will be used to study human motion, understanding posture while walking. It will be used to better understand the normal pace of a person and thus judge an abnormal pace.

Bio metric identification: This is used to identify individuals by their face, or if possible, identify them by their thumb print stored in few different databases.

Pattern mining: This is a subset of data mining and helps in observing patterns among routine activities. The use of this technology will help us identify if a person is seen an usual number of times behind a pharmacy window at particular time, allowing the machine to alert the authorities.

Intel interpretation: This is also used to make sense of the information gathered, and will include almost all features mentioned above, combining the results of each and making a final meaningful prediction.

Threat detection: A threat will be detected if during the intel processing a certain number of check boxes predefined when making the system are ticked.

Threat classification: As soon as a threat is detected, it is classified, and the threat can then be categorized into criminal case levels, including burglary, murder, or a possible terrorist attack; thus, based on the time line, near or distant future threats might be predictable.

Combining all of these features, we aim to produce software that has the capability of becoming a universal police officer, having eyes and ears everywhere. Obviously, we tend to use the CCTVs in urban areas during a preliminary round to see the functioning of such software in a real-world scenario. The idea is to train and make the software learn all previously recorded crimes whose footages are available (at least 5000 cases for optimum results), through supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning to help it to understand what a crime actually is. Thus, it will achieve a better understanding of criminality and can answer how crimes happen, as well as why and where. We do not propose simply making a world-class model to predict crimes, we also suggest making it understand previous crimes to better judge and therefore better predict them.

We aim to use this type of technology on two fronts: first and most importantly, for predicting crimes before they happen, followed by a thorough analysis of a crime scene allowing the system to possibly identify aspects that even a human eye can miss.

The most interesting cutting-edge and evolutionary idea that we believe should be incorporated is the use of scenario simulations. After analyzing the scene and using the 17 main characteristics mentioned above, the software should run at least 50 simulations of the present scenario presented in front of it, which will be assisted by previously learned crime recordings. The simulation will help the software in asserting the threat level and then accordingly recommend a course of action or alert police officials.

To visualize a possible scenario where we are able to invent such software, we prepared a flow chart (Fig. 3 ) to better understand the complete process.

Flowchart of our proposed model. The data are absorbed from the surrounding with the help of cameras and microphones. If the system depicts an activity as suspicious, it gathers more intel allowing the facial algorithms to match against a big database such as a Social Security Number or Aadhaar card database. When it detects a threat, it also classifies it into categories such as the nature of the crime and time span within which it is possible to take place. With all the gathered intel and all the necessary details of the possible crime, it alerts the respective authority with a 60-word synopsis to give them a brief idea, allowing law enforcement agencies to take action accordingly

Although this paper has been implemented with high accuracy and detailed research, there are certain challenges that can pose a problem in the future. First, the correct and complete building of the whole system has to be done in the near future, allowing its implementation to take place immediately and properly. Furthermore, the implementation itself is a significant concern, as such technologies cannot be directly implemented in the open world. The system must first be tested in a small part of a metropolitan area, and only then with constant improvements (revisions of the first model) can its usage be scaled up. Hence, the challenges are more of a help in perfecting the model and thus gradually providing a perfect model that can be applied to the real world. Moreover, there are a few hurdles in the technological aspects of the model, as the size of the learning data will be enormous, and thus processing it will take days and maybe even weeks. Although these are challenges that need to be addressed, they are aspects that a collective team of experts can overcome after due diligence, and if so, the end product will be worth the hard work and persistence.

Future scope

This paper presented the techniques and methods that can be used to predict crime and help law agencies. The scope of using different methods for crime prediction and prevention can change the scenario of law enforcement agencies. Using a combination of ML and computer vision can substantially impact the overall functionality of law enforcement agencies. In the near future, by combining ML and computer vision, along with security equipment such as surveillance cameras and spotting scopes, a machine can learn the pattern of previous crimes, understand what crime actually is, and predict future crimes accurately without human intervention. A possible automation would be to create a system that can predict and anticipate the zones of crime hotspots in a city. Law enforcement agencies can be warned and prevent crime from occurring by implementing more surveillance within the prediction zone. This complete automation can overcome the drawbacks of the current system, and law enforcement agencies can depend more on these techniques in the near future. Designing a machine to anticipate and identify patterns of such crimes will be the starting point of our future study. Although the current systems have a large impact on crime prevention, this could be the next big approach and bring about a revolutionary change in the crime rate, prediction, detection, and prevention, i.e., a “universal police officer”.

Conclusions

Predicting crimes before they happen is simple to understand, but it takes a lot more than understanding the concept to make it a reality. This paper was written to assist researchers aiming to make crime prediction a reality and implement such advanced technology in real life. Although police do include the use of new technologies such as Sting Rays and facial recognition every few years, the implementation of such software can fundamentally change the way police work, in a much better way. This paper outlined a framework envisaging how the aspects of machine and deep learning, along with computer vision, can help create a system that is much more helpful to the police. Our proposed system has a collection of technologies that will perform everything from monitoring crime hotspots to recognizing people from their voice notes. The first difficulty faced will be to actually make this system, followed by problems such as its implementation and use, among others. However, all of these problems are solvable, and we can also benefit from a security system that monitors the entire city around-the-clock. In other words, to visualize a world where we incorporate such a system into a police force, tips or leads that much more reliable can be achieved and perhaps crime can be eradicated at a much faster rate.

Availability of data and materials

All relevant data and material are presented in the main paper.

Abbreviations

Machine learning

Nondeterministic polynomial

Waikato Environment for Knowledge Analysis

K-nearest neighbor

Automatic number plate recognition

Deep neural network

Kernel density estimation

Support vector machine

Grey correlation analysis based on new weighted KNN

Autoregressive integrated moving average

Spatiotemporal crime network

Convolutional neural network

Area under the ROC curve

Recurrent neural network

Gated recurrent unit

Long short-term memory

Absolute percent error

Shah D, Dixit R, Shah A, Shah P, Shah M (2020) A comprehensive analysis regarding several breakthroughs based on computer intelligence targeting various syndromes. Augment Hum Res 5(1):14. https://doi.org/10.1007/s41133-020-00033-z

Patel H, Prajapati D, Mahida D, Shah M (2020) Transforming petroleum downstream sector through big data: a holistic review. J Pet Explor Prod Technol 10(6):2601–2611. https://doi.org/10.1007/s13202-020-00889-2

Szeliski R (2010) Computer vision: algorithms and applications. Springer-Verlag, Berlin, pp 1–979

MATH Google Scholar

Vedaldi A, Fulkerson B (2010) Vlfeat: an open and portable library of computer vision algorithms. Paper presented at the 18th ACM international conference on multimedia. ACM, Firenze. https://doi.org/10.1145/1873951.1874249

Le TL, Nguyen MQ, Nguyen TTM (2013) Human posture recognition using human skeleton provided by Kinect. In: Paper presented at the 2013 international conference on computing, management and telecommunications. IEEE, Ho Chi Minh City. https://doi.org/10.1109/ComManTel.2013.6482417

Ahir K, Govani K, Gajera R, Shah M (2020) Application on virtual reality for enhanced education learning, military training and sports. Augment Hum Res 5(1):7. ( https://doi.org/10.1007/s41133-019-0025-2 )

Talaviya T, Shah D, Patel N, Yagnik H, Shah M (2020) Implementation of artificial intelligence in agriculture for optimisation of irrigation and application of pesticides and herbicides. Artif Intell Agric 4:58–73. https://doi.org/10.1016/j.aiia.2020.04.002

Jha K, Doshi A, Patel P, Shah M (2019) A comprehensive review on automation in agriculture using artificial intelligence. Artif Intell Agric 2:1–12. https://doi.org/10.1016/j.aiia.2019.05.004

Kakkad V, Patel M, Shah M (2019) Biometric authentication and image encryption for image security in cloud framework. Multiscale Multidiscip Model Exp Des 2(4):233–248. https://doi.org/10.1007/s41939-019-00049-y

Pathan M, Patel N, Yagnik H, Shah M (2020) Artificial cognition for applications in smart agriculture: a comprehensive review. Artif Intell Agric 4:81–95. https://doi.org/10.1016/j.aiia.2020.06.001

Pandya R, Nadiadwala S, Shah R, Shah M (2020) Buildout of methodology for meticulous diagnosis of K-complex in EEG for aiding the detection of Alzheimer's by artificial intelligence. Augment Hum Res 5(1):3. https://doi.org/10.1007/s41133-019-0021-6

Dey A (2016) Machine learning algorithms: a review. Int J Comput Sci Inf Technol 7(3):1174–1179

Google Scholar

Sukhadia A, Upadhyay K, Gundeti M, Shah S, Shah M (2020) Optimization of smart traffic governance system using artificial intelligence. Augment Hum Res 5(1):13. https://doi.org/10.1007/s41133-020-00035-x

Musumeci F, Rottondi C, Nag A, Macaluso I, Zibar D, Ruffini M et al (2019) An overview on application of machine learning techniques in optical networks. IEEE Commun Surv Tutorials 21(2):1381–1408. https://doi.org/10.1109/COMST.2018.2880039

Patel D, Shah Y, Thakkar N, Shah K, Shah M (2020) Implementation of artificial intelligence techniques for cancer detection. Augment Hum Res 5(1):6. https://doi.org/10.1007/s41133-019-0024-3

Kundalia K, Patel Y, Shah M (2020) Multi-label movie genre detection from a movie poster using knowledge transfer learning. Augment Hum Res 5(1):11. https://doi.org/10.1007/s41133-019-0029-y

Article Google Scholar

Marsland S (2015) Machine learning: an algorithmic perspective. CRC Press, Boca Raton, pp 1–452. https://doi.org/10.1201/b17476-1

Jani K, Chaudhuri M, Patel H, Shah M (2020) Machine learning in films: an approach towards automation in film censoring. J Data Inf Manag 2(1):55–64. https://doi.org/10.1007/s42488-019-00016-9

Parekh V, Shah D, Shah M (2020) Fatigue detection using artificial intelligence framework. Augment Hum Res 5(1):5 https://doi.org/10.1007/s41133-019-0023-4

Gandhi M, Kamdar J, Shah M (2020) Preprocessing of non-symmetrical images for edge detection. Augment Hum Res 5(1):10 https://doi.org/10.1007/s41133-019-0030-5

Panchiwala S, Shah M (2020) A comprehensive study on critical security issues and challenges of the IoT world. J Data Inf Manag 2(7):257–278. https://doi.org/10.1007/s42488-020-00030-2

Simon A, Deo MS, Venkatesan S, Babu DR (2016) An overview of machine learning and its applications. Int J Electr Sci Eng 1(1):22–24.

Parekh P, Patel S, Patel N, Shah M (2020) Systematic review and meta-analysis of augmented reality in medicine, retail, and games. Vis Comput Ind Biomed Art 3(1):21. https://doi.org/10.1186/s42492-020-00057-7

Shah K, Patel H, Sanghvi D, Shah M (2020) A comparative analysis of logistic regression, random forest and KNN models for the text classification. Augment Hum Res 5(1):12. https://doi.org/10.1007/s41133-020-00032-0

Patel D, Shah D, Shah M (2020) The intertwine of brain and body: a quantitative analysis on how big data influences the system of sports. Ann Data Sci 7(1):1–16. https://doi.org/10.1007/s40745-019-00239-y

Judd S (1988) On the complexity of loading shallow neural networks. J Complex 4(3):177–192. https://doi.org/10.1016/0885-064X(88)90019-2

Article MathSciNet MATH Google Scholar

Blum AL, Rivest RL (1992) Training a 3-node neural network is NP-complete. Neural Netw 5(1):117–127. https://doi.org/10.1016/S0893-6080(05)80010-3

Gupta A, Dengre V, Kheruwala HA, Shah M (2020) Comprehensive review of text-mining applications in finance. Financ Innov 6(1):1–25. https://doi.org/10.1186/s40854-020-00205-1

Shah N, Engineer S, Bhagat N, Chauhan H, Shah M (2020) Research trends on the usage of machine learning and artificial intelligence in advertising. Augment Hum Res 5(1):19. https://doi.org/10.1007/s41133-020-00038-8

Naik B, Mehta A, Shah M (2020) Denouements of machine learning and multimodal diagnostic classification of Alzheimer's disease. Vis Comput Ind Biomed Art 3(1):26. https://doi.org/10.1186/s42492-020-00062-w

Chen P, Yuan HY, Shu XM (2008) Forecasting crime using the ARIMA model. In: Paper presented at the 5th international conference on fuzzy systems and knowledge discovery. IEEE, Ji'nan 18-20 October 2008. https://doi.org/10.1109/FSKD.2008.222

Rani A, Rajasree S (2014) Crime trend analysis and prediction using mahanolobis distance and dynamic time warping technique. Int J Comput Sci Inf Technol 5(3):4131–4135

Gorr W, Harries R (2003) Introduction to crime forecasting. Int J Forecast 19(4):551–555. https://doi.org/10.1016/S0169-2070(03)00089-X

Rummens A, Hardyns W, Pauwels L (2017) The use of predictive analysis in spatiotemporal crime forecasting: building and testing a model in an urban context. Appl Geogr 86:255–261. https://doi.org/10.1016/j.apgeog.2017.06.011

Bates A (2017) Stingray: a new frontier in police surveillance. Cato Institute Policy Analysis, No. 809

Joh EE (2017) The undue influence of surveillance technology companies on policing. N Y Univ Law Rev 92:101–130. https://doi.org/10.2139/ssrn.2924620

Vredeveldt A, Kesteloo L, Van Koppen PJ (2018) Writing alone or together: police officers' collaborative reports of an incident. Crim Justice Behav 45(7):1071–1092. https://doi.org/10.1177/0093854818771721

McNeal GS (2014) Drones and aerial surveillance: considerations for legislators. In: Brookings Institution: The Robots Are Coming: The Project On Civilian Robotics, November 2014, Pepperdine University Legal Studies Research Paper No. 2015/3

Fatih T, Bekir C (2015) Police use of technology to fight against crime. Eur Sci J 11(10):286–296

Katz CM, Choate DE, Ready JR, Nuňo L (2014) Evaluating the impact of officer worn body cameras in the Phoenix Police Department. Center for Violence Prevention & Community Safety, Arizona State University, Phoenix, pp 1–43

Stanley J (2015) Police body-mounted cameras: with right policies in place, a win for all. https://www.aclu.org/police-body-mounted-cameras-right-policies-place-win-all . Accessed 15 Aug 2015

McClendon L, Meghanathan N (2015) Using machine learning algorithms to analyze crime data. Mach Lear Appl Int J 2(1):1–12. https://doi.org/10.5121/mlaij.2015.2101

Frank E, Hall M, Trigg L, Holmes G, Witten IH (2004) Data mining in bioinformatics using Weka. Bioinformatics 20(15):2479–2481. https://doi.org/10.1093/bioinformatics/bth261

Kim S, Joshi P, Kalsi PS, Taheri P (2018) Crime analysis through machine learning. In: Paper presented at the IEEE 9th annual information technology, electronics and mobile communication conference. IEEE, Vancouver 1-3 November 2018. https://doi.org/10.1109/IEMCON.2018.8614828

Tabedzki C, Thirumalaiswamy A, van Vliet P (2018) Yo home to Bel-Air: predicting crime on the streets of Philadelphia. In: University of Pennsylvania, CIS 520: machine learning

Bharati A, Sarvanaguru RAK (2018) Crime prediction and analysis using machine learning. Int Res J Eng Technol 5(9):1037–1042

Prithi S, Aravindan S, Anusuya E, Kumar AM (2020) GUI based prediction of crime rate using machine learning approach. Int J Comput Sci Mob Comput 9(3):221–229

Kang HW, Kang HB (2017) Prediction of crime occurrence from multi-modal data using deep learning. PLoS One 12(4):e0176244. https://doi.org/10.1371/journal.pone.0176244

Bandekar SR, Vijayalakshmi C (2020) Design and analysis of machine learning algorithms for the reduction of crime rates in India. Procedia Comput Sci 172:122–127. https://doi.org/10.1016/j.procs.2020.05.018

Hossain S, Abtahee A, Kashem I, Hoque M, Sarker IH (2020) Crime prediction using spatio-temporal data. arXiv preprint arXiv:2003.09322. https://doi.org/10.1007/978-981-15-6648-6_22

Stalidis P, Semertzidis T, Daras P (2018) Examining deep learning architectures for crime classification and prediction. arXiv preprint arXiv: 1812.00602. p. 1–13

Jha P, Jha R, Sharma A (2019) Behavior analysis and crime prediction using big data and machine learning. Int J Recent Technol Eng 8(1):461–468

Tyagi D, Sharma S (2018) An approach to crime data analysis: a systematic review. Int J Eng Technol Manag Res 5(2):67–74. https://doi.org/10.29121/ijetmr.v5.i2.2018.615

Lin YL, Yen MF, Yu LC (2018) Grid-based crime prediction using geographical features. ISPRS Int J Geo-Inf 7(8):298. https://doi.org/10.3390/ijgi7080298

Ahishakiye E, Taremwa D, Omulo EO, Niyonzima I (2017) Crime prediction using decision tree (J48) classification algorithm. Int J Comput Inf Technol 6(3):188–195

Sun CC, Yao CL, Li X, Lee K (2014) Detecting crime types using classification algorithms. J Digit Inf Manag 12(8):321–327. https://doi.org/10.14400/JDC.2014.12.8.321

Shojaee S, Mustapha A, Sidi F, Jabar MA (2013) A study on classification learning algorithms to predict crime status. Int J Digital Content Technol Appl 7(9):361–369

Obuandike GN, Isah A, Alhasan J (2015) Analytical study of some selected classification algorithms in WEKA using real crime data. Int J Adv Res Artif Intell 4(12):44–48. https://doi.org/10.14569/IJARAI.2015.041207

Iqbal R, Murad MAA, Mustapha A, Panahy PHS, Khanahmadliravi N (2013) An experimental study of classification algorithms for crime prediction. Indian J Sci Technol 6(3):4219–4225. https://doi.org/10.17485/ijst/2013/v6i3.6

Jangra M, Kalsi S (2019) Crime analysis for multistate network using naive Bayes classifier. Int J Comput Sci Mob Comput 8(6):134–143

Wibowo AH, Oesman TI (2020) The comparative analysis on the accuracy of k-NN, naive Bayes, and decision tree algorithms in predicting crimes and criminal actions in Sleman regency. J Phys Conf Ser 1450:012076. https://doi.org/10.1088/1742-6596/1450/1/012076

Vanhoenshoven F, Nápoles G, Bielen S, Vanhoof K (2017) Fuzzy cognitive maps employing ARIMA components for time series forecasting. In: Czarnowski I, Howlett RJ, Jain LC (eds) Proceedings of the 9th KES international conference on intelligent decision technologies 2017, vol 72. Springer, Heidelberg, pp 255–264. https://doi.org/10.1007/978-3-319-59421-7_24

Chapter Google Scholar

Gorr W, Olligschlaeger AM, Thompson Y (2000) Assessment of crime forecasting accuracy for deployment of police. Int J Forecast 2000:743–754

Yu CH, Ward MW, Morabito M, Ding W (2011) Crime forecasting using data mining techniques. In: Paper presented at the 2011 IEEE 11th international conference on data mining workshops. IEEE, Vancouver 11-11 December 2011. https://doi.org/10.1109/ICDMW.2011.56

Alves LGA, Ribeiro HV, Rodrigues FA (2018) Crime prediction through urban metrics and statistical learning. Phys A Stat Mech Appl 505:435–443. https://doi.org/10.1016/j.physa.2018.03.084

Idrees H, Shah M, Surette R (2018) Enhancing camera surveillance using computer vision: a research note. Polic Int J 41(2):292–307. https://doi.org/10.1108/PIJPSM-11-2016-0158

Wu G, Wu Y, Jiao L, Wang YF, Chang EY (2003) Multi-camera spatio-temporal fusion and biased sequence-data learning for security surveillance. In: Paper presented at the 11th ACM international conference on multimedia. ACM, Berkeley 2-8 November 2003. https://doi.org/10.1145/957013.957126

Wang YF, Chang EY, Cheng KP (2005) A video analysis framework for soft biometry security surveillance. In: Paper presented at the 3rd ACM international workshop on video surveillance & sensor networks. ACM, Hilton 11 November 2005. https://doi.org/10.1145/1099396.1099412

Shah M, Javed O, Shafique K (2007) Automated visual surveillance in realistic scenarios. IEEE MultiMed 14(1):30–39. https://doi.org/10.1109/MMUL.2007.3

Burton AM, Wilson S, Cowan M, Bruce V (1999) Face recognition in poor-quality video: evidence from security surveillance. Psychol Sci 10(3):243–248. https://doi.org/10.1111/1467-9280.00144

Goyal A, Bhatia R (2016) Automated car number plate detection system to detect far number plates. IOSR J Comput Eng 18(4):34–40. https://doi.org/10.9790/0661-1804033440

Wang B, Yin PH, Bertozzi AL, Brantingham PJ, Osher SJ, Xin J (2019) Deep learning for real-time crime forecasting and its ternarization. Chin Ann Math Ser B 40(6):949–966. https://doi.org/10.1007/s11401-019-0168-y

Stec A, Klabjan D (2018) Forecasting crime with deep learning. arXiv preprint arXiv:1806.01486. p. 1–20

Chen YL, Wu BF, Huang HY, Fan CJ (2011) A real-time vision system for nighttime vehicle detection and traffic surveillance. IEEE Trans Ind Electron 58(5):2030–2044. https://doi.org/10.1109/TIE.2010.2055771

Poppe R (2007) Vision-based human motion analysis: an overview. Comput Vision Image Underst 108(1–2):4–18. https://doi.org/10.1016/j.cviu.2006.10.016

Najjar A, Kaneko S, Miyanaga Y (2018) Crime mapping from satellite imagery via deep learning. arXiv preprint arXiv:1812.06764. p. 1–8

Khosla A, An B, Lim JJ, Torralba A (2014) Looking beyond the visible scene. In: Paper presented at the of IEEE conference on computer vision and pattern recognition. IEEE, Columbus 23-28 June 2014. https://doi.org/10.1109/CVPR.2014.474

Dee HM, Velastin SA (2008) How close are we to solving the problem of automated visual surveillance? Mach Vis Appl 19(5–6):329–343. https://doi.org/10.1007/s00138-007-0077-z

Rautaray SS (2012) Real time hand gesture recognition system for dynamic applications. Int J Ubi Comp 3(1):21–31. https://doi.org/10.5121/iju.2012.3103

Zhuang Y, Almeida M, Morabito M, Ding W (2017) Crime hot spot forecasting: a recurrent model with spatial and temporal information. In: Paper presented at the IEEE international conference on big knowledge. IEEE, Hefei 9-10 August 2017. https://doi.org/10.1109/ICBK.2017.3

Duan L, Hu T, Cheng E, Zhu JF, Gao C (2017) Deep convolutional neural networks for spatiotemporal crime prediction. In: Paper presented at the 16th international conference information and knowledge engineering. CSREA Press, Las Vegas 17-20 July 2017

Download references

Acknowledgements

The authors are grateful to Department of Computer Engineering, SAL Institute of Technology and Engineering Research and Department of Chemical Engineering, School of Technology, Pandit Deendayal Energy University for the permission to publish this research.

Not applicable.

Author information

Authors and affiliations.

Department of Computer Engineering, Sal Institute of Technology and Engineering Research, Ahmedabad, Gujarat, 380060, India

Neil Shah & Nandish Bhagat

Department of Chemical Engineering, School of Technology, Pandit Deendayal Energy University, Gandhinagar, Gujarat, 382426, India

You can also search for this author in PubMed Google Scholar

Contributions

All the authors make substantial contribution in this manuscript; NS, NB and MS participated in drafting the manuscript; NS, NB and MS wrote the main manuscript; all the authors discussed the results and implication on the manuscript at all stages the author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Manan Shah .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Shah, N., Bhagat, N. & Shah, M. Crime forecasting: a machine learning and computer vision approach to crime prediction and prevention. Vis. Comput. Ind. Biomed. Art 4 , 9 (2021). https://doi.org/10.1186/s42492-021-00075-z

Download citation

Received : 18 July 2020

Accepted : 05 April 2021

Published : 29 April 2021

DOI : https://doi.org/10.1186/s42492-021-00075-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Computer vision
Crime forecasting

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

We're Hiring!
Help Center

DESIGN AND IMPLEMENTATION OF AN ONLINE CRIME REPORTING SYSTEM

An ideal society is governed by laws and measurable consequences are meted out to any member of the society that is found guilty of law breaking. Customarily, members of the society are expected to report any incidence of breakdown of law and order to the appropriate law enforcement authorities. In time past, the process of reporting crimes in the society (Nigeria) had involved going into any of the offices of the law enforcement agencies (e.g. Police, neighborhood corps) to make a report, which made anonymity next to impossible. But the advent of technology opened more avenues for reporting crimes; from telegraph, special radio communication, and dedicated phone lines to a more responsive and more pervasive technological application platforms (web and mobile software applications). This project develops an all-encompassing web platform that reports all manners of crimes, open to all members of the public, suggestive (search for entities), interpretative and enlightening. It also provides anonymity while reporting crime, for those who desire

Related Papers

Mohammed Sharief

National surveys demonstrate that millions of crimes go Unreported. Several reasons may contribute to this lack of reporting. Also Crime reporting needs to be possible 24/7. Although several other options exist and there are most publicized reporting mechanisms. Internet-based crime reporting systems allow victims and witnesses of crime to report incidents to police 24/7 from any location. The aim of this project is to develop an online crime report and managing system which is easily accessible to the public. The police department and the administrative department. The system is intended for use in a community to help the residents interact with each other more easily and to encourage the reporting of suspicious behaviour or crime.

D. Contreras , Diana Contreras

IRJET Journal

The aim of this project is to develop an online managing crime report system which is accessible to the common public and the police department easily. The system provides users with the information about the crime rates of a desired area. This is useful for tourists who are entering in an unvisited area. If the user enters in 100 meter radius of a high alert area then he/she will be notified with an alert message. The user gets notified about the different crime rates in the area and can provide the safest path to the desired destination. Also, the system registers the complaints from the people through online web application where they can upload images and videos of the crime and it will be helpful for the police department in catching criminals. The person can give complaint at any time.

Benjamin Kommey

abraham godwin

Agnes Akinwole

Web Based Criminal Diary is a web based application whereby data of criminals been convicted by a judge in the court of law in Nigeria are shown to the entire public. Presently, criminal records are kept manually in Nigeria, which means when a person needs to be investigated and it is needed to know if the person has a criminal record in the country they need to pass through different manual processes. With the use of manual record keeping, the criminal records can easily be manipulated by people in charge. The focus of this research work is to design a web-based application system for criminal record in Nigeria, towards elimination of challenges (Such as loss of criminal records, inefficiency in criminal record keeping, data manipulation, and menace of full Paper-based record keeping) surrounding manual processing currently in use. The product of this research work will also help to minimize crime rate in our country since he opportunities and benefits lost as a result of a criminal record will create lifelong barriers for anyone attempting to overcome a criminal past in our country.

IJAERS Journal

This project work automates web based crime and criminals tracking system. The system is web based application that enable user at the international hotels and at the four police office stations to communicate each other and with district police office. All user of the system are communicating through the internet service provider infrastructure. But if the internet is down, the system will work offline. The system and its database is configured or replicated to the police station’s mini server computers at each woredas. Whenever the network is available the mini servers synchronize data with the central server at the district police office. Before we develop this project the international hotels daily fill form about their customers booked in and send by human labor to the nearby police station. The CCTS alleviates this challenge. Now, using our web system the hotel manager or receptionist access the customer booked in hotel registration page from the server, fills and hit submit button. The information is submitted to the police office through the network and saved to their computer at the same time. While the data is being submitted the data is checked against the criminal’s list using primary key or unique number, if criminal or suspects found alert or report will be displayed to the system administrator with detail information. The prototype has been tested with data from Adama city district police office. It has been observed that the system successfully registers crime and criminals, lost property, international hotels; customer booked in hotel, generates reports and provides search facilities. In addition to these, customers are not expected to go to the four police stations or woredas if they want to register lost property. As lost property, crime and criminals registered at a station it will be shared across the network. It has also been shown that the system facilitates to view the status of cases, knowledge bases. This system promotes the manual works into digital. It reduces the time, produces accurate results and also implements security for the information.

International Journal of Innovative Technology and Exploring Engineeringon

Douglas Kelechi

Crime detection, investigation and prosecution are usually carried out by the various law enforcement agencies saddled with such responsibilities. In this study, an integrated web based unified system was developed and implemented for the five (5) agencies (Nigerian Police Force, National Drug Law Enforcement Agency, Economic Financial Crime Commission, Independent Corrupt Practices Commission, Department of State Services) to enhance domiciliation of crime data into one system for effective information sharing among the five agencies. The methodology adopted for the system design is Object Oriented Analysis and Design Methodology (OOADM) and the tools used are HTML, CSS, JavaScript, MySQL. The result obtained shows that with the integration of the five agencies, accurate records of suspects and victims were timely shared by the various agencies. There is also an effective collaboration among the various agencies in crime detection, investigation and prosecution of suspects.

Oladele Adeola

The menace of crime particularly Armed Robbery, Kidnapping, Oil theft and Burglary in Nigeria today is real and potent. Many Schemes have been devised so far to control this trend but all the current strategies have not been able to reduce the threat. E-neighborhood is particularly neighborhood originated electronic monitoring system to monitor the movement of criminals in a neighborhood in order to nip their activities in the bud so as to either prevent crime commission or track crime after it is committed. In this work we examine various crime control methods adopted so far to reduce incidences of armed robbery in Nigeria and propose an e-neighborhood paradigm for crime detection and control. The system has the basic component of the Voice to Text (VTT) interface, Centre Control (CC), Main server, sub-servers, pagers and mobile phones. It is capable of receiving the voice message of crime alert and automatically translating the voice message to text massage for further processing.

IJRASET Publication

Crime investigation system keeps track and maintains history about each and every case in the database which ultimately avoids manual and paper work. It is based on decentralized client-server architecture in order to facilitate independent functioning of all units. Here information flows from lower units to higher units and vice-versa. The project is built on multi-tier architecture. It provides quick access to data which is essential for effective prevention, control and detection of crime. It also helps decision-making and decision support processes. Crime investigation system is an intelligent decision support system that can assist human investigators by automatically constructing plausible scenarios. Some of the benefits of crime investigation system to existing system are quick retrieval of data, huge saving of time, proper deployment & utilization of manpower, cost reduction leading to saving in expenditure. It enables the police service to supervise and administer any incident and process related information by providing standardized processes and procedures to monitor and improve performance. Using this application people can give their complaint through online. To register any type of complaint they need to register his personal details along with login details. Once he/she registers in the application he/she can post their complaints.

RELATED PAPERS

KLINIK ABORSI

AI Magazine

Eric Dietrich

Pharmaceutical Medicine

Eder daniel Morales garcia

Research in Psychotherapy: Psychopathology, Process and Outcome

Anne-Linde Joki

Tijdschrift voor Toezicht

Jos Verkroost

Corinne GOBIN

Journal of Biological Chemistry

Meenakshi A . Chellaiah

Community based participatory research

Jamie DeMore

Jong-Hwan Lee

Revista UIS Ingenierías

kevin molina

Ivan Roussev

SSRN Electronic Journal

Leonardo Martinez

Aditya Wisnu Pratama

Journal of Applied Phycology

Ileana Ortegon-Aznar

ANEB Anorexie et boulimie Québec

mari sanchez

Godofredo Vidal

alexander ulloa godoy

AIChE Journal

Duncan Mellichamp

西蒙弗雷泽大学毕业证文凭学历认证办理加拿大SFU成绩单学历证书

M Zainuri (C0D022026)

Confidence Odoh

Nicolas Thomas

Crime Reporting Systems and Methods Research Paper

View sample criminology research paper on crime reporting systems and methods. Browse other research paper examples for more inspiration. If you need a thorough research paper written according to all the academic standards, you can always turn to our experienced writers for help. This is how your paper can get an A! Feel free to contact our writing service for professional assistance. We offer high-quality assignments for reasonable rates.

In order to better understand, explain, and control crime, one needs accurate counts of its occurrence. Crime statistics represent the counts of criminal behavior and criminals. They are typically uniform data on offenses and offenders and are derived from records of official criminal justice agencies, from other agencies of control, and from unofficial sources such as surveys of victimization or criminal involvement. Particularly in the case of official crime statistics, they may be published annually or periodically in a relatively standard format of data presentation and analysis.

Academic Writing, Editing, Proofreading, And Problem Solving Services

Get 10% off with 24start discount code.

Official crime statistics are generated at different levels of government (municipal, state, and federal) by a variety of criminal justice agencies (police, court, and corrections) and at different stages in the criminal justice process (arrest, prosecution, conviction, imprisonment, and parole). Official statistics are also produced on the violation of laws, codes, and standards of a variety of administrative and regulatory agencies of control, primarily at the federal level. Official crime statistics are based on the records of those agencies that are the official registrars of criminal behavior and criminals.

Unofficial crime statistics are produced independently of the records of official agencies of crime control. The sources of these statistics are the records of private security and investigative agencies and the data collected by social scientists through experiments and observations, as well as through surveys of victimization and of self-reported criminal involvement.

Crime statistics emerged in the early nineteenth century as an adjunct to the administration of justice, the primary purpose being the measurement of the amount of crime, particularly ‘‘to know if crime had increased or decreased’’ in order to inform crime control policy and practice (Sellin and Wolfgang, p. 9). Early researchers pointed out the ultimately more important purpose of measuring the distribution of crime by a variety of social, demographic, and geographic characteristics. Both official and unofficial crime statistics have distinctive problems and sources of error, but a major one they share is the underestimation of the actual amount of crime. However, it is probable that the various measures generate similar distributions of crime, meaning that there is convergence rather than discrepancy in their depictions of the characteristics and correlates of crime. It is also likely that multiple indicators of crime best inform research, theory, policy, and practice.

The major types of official and unofficial crime statistics are discussed here in terms of their history and contemporary sources; their role as measures of crime; methodological and utilization issues and problems; and the general issue of discrepancy or convergence among crime statistics regarding the distribution and correlates of crime.

History of Crime Statistics

Simultaneously with the emergence of the discipline of statistics in the seventeenth century, the fledgling discipline’s luminaries began to call for crime statistics in order to ‘‘know the measure of vice and sin in the nation’’ (Sellin and Wolfgang, p. 7). It was not until the nineteenth century that the measurement of a nation’s moral health by means of statistics led to the development of the branch of statistics called ‘‘moral statistics.’’ France began systematically collecting national judicial statistics on prosecutions and convictions in 1825. For the first time, comprehensive data on crime were available to the overseers of moral health, as well as to researchers. The French data became the source of the first significant statistical studies of crime, by the Belgian Adolphe Quetelet and the Frenchman Andre Michel Guerry, who have been called the founders of the scientific sociological study of crime. Soon afterward, similar analytical and ecological studies of crime were carried out by other Europeans who were influenced directly by, and made frequent references to, the work of Quetelet and Guerry.

In the United States, the earliest crime statistics were state judicial statistics on prosecutions and convictions in court and on prisoners in state institutions. New York began collecting judicial statistics in 1829, and by the turn of the twentieth century twenty-four other states had instituted systems of court data collection. Prison statistics were first gathered in 1834 in Massachusetts, and twenty-three other states had begun the systematic collection of prison data by 1900 (Robinson). The early state data on imprisonment were augmented by the first national enumeration of persons institutionalized in prisons and jails as part of the 1850 census and by subsequent decennial (taken every ten years) population counts thereafter. These early United States Bureau of the Census statistics are relatively complete and informative, including for each prisoner the year and offense of commitment, sex, birthplace, age, race, occupation, and literacy status.

By the end of the nineteenth century, most European countries and a number of states in the United States were systematically collecting judicial and prison statistics, and concomitantly most of the problems relating to these statistics and the measurement of crime in general had been identified. Numerous critics pointed to the fact that judicial and prison statistics were ‘‘incomplete’’ measures of the actual amount and distribution of crime in the community, primarily because of the ‘‘dark figure’’ of undetected, unreported, unacted upon, or unrecorded crime. It has always been clear that not all crimes committed in the community come to the attention of the police, that only a portion of crimes known to the police eventuate in arrest, that not all offenders who have been arrested are prosecuted or convicted, and that only a small fraction of the cases where there is a conviction lead to imprisonment. This underestimation of the volume of crime is not necessarily problematic if, as Quetelet suggested, we ‘‘assume that there is a nearly invariable relationship between offenses known and adjudicated and the total unknown sum of offenses committed’’ (p. 18). In other words, if there is a constant ratio between the actual amount of crime (including the dark figure of unknown offenses) and officially recorded crime, whether recorded by arrest, prosecution, conviction, or imprisonment, then the latter is ‘‘representative’’ of the former and acceptable as a measure of crime. Later research showed this to be a fallacious assumption, but during the nineteenth century and through the first quarter of the twentieth century, scholars and practitioners alike generally operated under this assumption in using and defending judicial statistics as the true measure of crime in a society. Critics pointed to the fact that judicial statistics were not representative of the actual number of crimes or criminals in their proposals that police statistics, particularly of ‘‘offenses known to the police,’’ be used in the measurement of crime.

Beginning in 1857, Great Britain was the first nation to systematically collect police data, including offenses known to the police. The significance of this type of data was appreciated by only a few nineteenth-century scholars, among them Georg Mayr, the leading criminal statistician of the time. In 1867, he published the first statistical study using ‘‘crimes known to the police’’ as the primary data source, proposing that crimes known to the police should be the foundation of moral statistical data on crime (Sellin and Wolfgang, p. 14). A few researchers called for utilization of police statistics, but judicial statistics on prosecution and conviction remained the crime statistic of choice in studies of the amount and distribution of crime.

Although the origin, utilization, and defense of judicial statistics were a European enterprise, the emergence of police statistics as a legitimate and eventually favored index of crime can be characterized as an American endeavor. As a result of a growing dissatisfaction with judicial statistics and of the fact—axiomatic in criminology—that ‘‘the value of a crime rate for index purposes decreases as the distance from the crime itself in terms of procedure increases’’ (Sellin, p. 346), the American criminologist August Vollmer in 1920 proposed a national bureau of criminal records that, among other tasks, would compile data on crimes known to the police. In 1927 the International Association of Chiefs of Police made this suggestion an actuality by developing a plan for a national system of police statistics, including offenses known and arrests, collected from local police departments in each state. The Federal Bureau of Investigation became the clearinghouse for these statistics and published in 1931 the first of its now-annual Uniform Crime Reports (UCR). That same year, ‘‘offenses known to the police’’ was accorded even more legitimacy as a valid crime statistic by the Wickersham Commission, which stated that the ‘‘best index of the number and nature of offenses committed is police statistics showing offenses known to the police’’ (U.S. National Commission on Law Observance and Enforcement, p. 25). Ever since that time, ‘‘offenses known to the police’’ has generally been considered the best source of official crime data. However, most of the European countries that had developed national reporting systems of judicial statistics did not include police statistics, particularly crimes known, until the 1950s, and ironically, Great Britain did not acknowledge that crimes known to the police was a valid measure of crime until the mid-1930s, although these data had been collected since the mid-nineteenth century (Sellin and Wolfgang, pp. 18–21).

According to Thorsten Sellin’s axiom, ‘‘crimes known to the police’’ has the most value of all official measures of crime because it is closest procedurally to the actual crime committed, probably as close as an official crime statistic will ever be. Even so, as with each and every measure and crime statistic, there are problems regarding even this best of official crime statistics.

Official Crime Statistics

Contemporary official crime statistics, proliferating with the growth of crime-control bureaucracies and their need to keep records, are more comprehensive and varied than nineteenth-century judicial statistics and early twentieth-century police statistics. The purposes and functions of crime statistics have also changed.

Whereas the early judicial statistics were utilized to measure a nation’s moral health or the social and spatial distribution of crime, many of the more contemporary official statistics are the byproducts of criminal justice ‘‘administrative bookkeeping and accounting.’’ For example, data are collected on such matters as agency manpower, resources, expenditures, and physical facilities, as well as on warrants filed and death-row populations. Consequently, in the United States there are hundreds of national— and thousands of state and local—sources of official statistics, most of which are best characterized as data on the characteristics and procedures of the administration of criminal justice and crime control.

Given the different histories of judicial and police statistics in Europe and the United States, it is not surprising that in the latter there are relatively good police data compiled on a nationwide annual basis and relatively poor judicial data. In fact, the United States is one of only a few developed countries that publishes no national court statistics. Reflecting the unique history of corrections in the United States, where the state prison and local jail are differentiated by jurisdiction, incapacitative functions, type of inmate, and record-keeping practices, there are relatively comprehensive annual national data on the number and characteristics of adults under correctional supervision in state and federal prisons, but no national statistics on jail populations are published. A review of sources of criminal justice statistics concluded, ‘‘the absence of regular annual data on jail inmates is, along with the absence of court statistics, the most glaring gap in American criminal justice statistics’’ (Doleschal, p. 123).

Official crime statistics measure crime and crime control. Clearly, the historically preferred source of official statistics on the extent and nature of crime is police data, particularly crimes known to the police. Other official data gathered at points in the criminal justice system that are procedurally more distant from the crime committed are less valid and less useful measures of crime. However, these data can serve as measures of the number and social characteristics of those who are arrested, prosecuted, convicted, or imprisoned; of the characteristics, administration, and procedures of criminal justice within and between component agencies; and of the socially produced and recognized amount and distribution of crime. Official statistics, except for data on crimes known to the police are more correctly regarded as measures of crime control because they record a social-control reaction, of the criminal justice system to a known offense or offender. For example, a crime known to the police typically reported to the police by a complainant, and the record of it is evidence of the detection of a crime. If the police clear the offense through arrest, the arrest record is evidence of the sanction of a criminal, a measure of crime control (Black). In other words, a crime known to the police registers acknowledgment of an offense; an arrest, of an offender; a prosecution and conviction of the offender. Arrest, prosecution, conviction, and disposition statistics as well as administrative bookkeeping and accounting data, are best thought of as information of the characteristics, procedures, and processes of crime control. The focus in this research paper will be the official statistics of crime, specifically police statistics of offenses known.

A Measure of Crime: Offenses Known to The Police

From the beginning, the primary objective of the Uniform Crime Reports was made clear in 1929 by the committee on Uniform Crime Records of the International Association of Chiefs of Police—to show the number and nature of criminal offenses committed. At the time it was argued that among the variety of official data, not only were ‘‘offenses known’’ closest to the crime itself, but a more constant relationship existed between offenses committed and offenses known to the police than between offenses committed and other official data—assumptions shown to be erroneous by victimization surveys many years later. Nevertheless the UCR have always been the most widely consulted source of statistics on crime in the United States.

The UCR are published annually by the F.B.I. and provide statistics on the amount and distribution of crimes known to the police and arrests, with other, less complete data on clearances by arrest, average value stolen in a variety of property offenses, dispositions of offenders charged, number of law enforcement personnel, and officers assaulted or killed. The statistics are based on data submitted monthly by the fifteen thousand municipal, county, and state law enforcement agencies, which have jurisdiction over approximately 98 percent of the U.S. population.

Of crimes known and arrests, data are collected in twenty-nine categories of offenses, using standardized classifications of offenses and reporting procedures. Crimes known and arrests are presented for the eight original ‘‘index crimes’’—murder, rape, robbery, aggravated assault, burglary, larceny, motor-vehicle theft, and arson—and arrests only, for the remaining (nonindex) crimes. Arson was added as an index crime in 1979. For each index crime, crimes known to the police are presented by number of crimes reported, rate per hundred thousand population, clearances, nature of offense, geographical distribution (by state, region, size, and degree of urbanization), and number of offenders arrested. Arrests are presented by total estimate for each index and nonindex crime, rate per hundred thousand population, age, sex, race, and decade trend.

The index crimes are intended to represent serious and high-volume offenses. The Total Crime Index is the sum of index crimes, and subtotals are provided on violent and property index crimes. The Total Crime Index, and to a lesser extent the violent- and property-crime indexes, are often used to report national trends in the extent and nature of crime.

The statistics presented in the UCR of crimes known to the police are records of ‘‘reported’’ crime, and since reporting and recording procedures and practices are major sources of methodological and utilization problems, they deserve further attention. Crimes known to the police are typically offenses reported to the police by a victim or other person and are recorded as such unless they are ‘‘unfounded,’’ or false. For property crimes, one incident is counted as one crime, whereas for violent crimes one victim is counted as one crime. Except for arson, the most serious of more than one index crime committed during an incident is counted; arson is recorded even when other index crimes are committed during the same incident. For example, stealing three items from a store counts as one larceny, but beating up three people during an altercation counts as three assaults.

Larceny and motor-vehicle theft account for the largest proportion of index crimes, for reasons pointed to by critics. Both are the least serious of the index crimes, with larceny of any amount now eligible to be counted, and motor vehicle theft having one of the highest rates of victim reports to the police because theft must be established to file for insurance claims. On the other hand, many crimes that could be considered more serious because they involve physical injury and bodily harm to a victim are not index crimes. Moreover, completed and attempted crimes are counted equally for more than half of the index crimes. Robbery is counted as a violent crime and accounts for almost one-half of all reported violent index crimes. Most other countries classify robbery as a major larceny, as did the United States before the inception of the UCR. Of course, this difference in classification explains in part the relatively higher rate of violence in the United States. A number of other serious offenses are not counted at all in the reporting program, including a variety of victimless, white-collar, and organizational crimes, as well as political corruption, violations of economic regulations, and the whole array of federal crimes. One might characterize the Total Crime Index as a measure of the extent, nature, and trends of relatively ordinary street crime in the United States.

There are also some problems in the presentation of these data. The Total Crime Index, as a simple sum of index offenses, cannot be sensitive to the differential seriousness of its constituent offense categories and to the relative contributions made by frequency and seriousness of offenses to any index of a community’s crime problem (Sellin and Wolfgang). Rudimentary summations of data also mask potentially important variations among offenses and other factors. Comparisons of data from year to year, and even from table to table for the same year, may be hampered in some cases because data may be analyzed in various ways (for example, by aggregating data in different ways for different tables). Comparisons are also made difficult by the use of inappropriate bases (or denominators) in the computation of the rates that are presented both for crimes known to the police and for arrests.

The crime rates given in the UCR, as well as in most criminal justice statistical series, are computed as the number of crimes per year per hundred thousand population. This type of ‘‘crude’’ rate can lead to inappropriate inferences from the data. The use of crude rates can conceal variation in subgroups of the population, so it is desirable to standardize rates for subgroups whose experience is known to be different, for example, by sex, race, and age. These subgroup-specific rates also facilitate comparisons between groups: male-female rates, white-black rates, and juvenile-adult rates.

At times, inappropriate population bases are used in calculating rates. A crime rate represents the ratio of the number of crimes committed to the number of persons available and able to commit those crimes; this ratio is then standardized by one hundred thousand of whoever is included in the base. For some offenses, the total population is an inappropriate base. For example, a forcible-rape crime rate based on the total population is less appropriate than a rate based on the number of males available and able to commit rape. Similarly, the juvenile crime rate should reflect the number of crimes committed by the population of juveniles.

Crime rates can be interpreted as victimization rates, depending on who (or what) are included in the base. If the total population base can be considered potential criminals, they can also be considered potential victims. For crimes where the victim is a person, the calculation of surrogate victimization rates using crime data is relatively straightforward—the number of available victims becomes the base. Again, in the case of forcible rape, the total population and the male population would be inappropriate bases— here the population of available victims is essentially female. Therefore, the surrogate victimization rate would be calculated as the number of forcible rapes known to the police per hundred thousand female population.

For property crimes it is more difficult, but not impossible, to calculate surrogate victimization rates. Here the denominator may have to be reconceptualized not as a population base but as a property base. For example, Boggs, and later Cohen and Felson, included ‘‘opportunities’’ for property theft in the bases of their analyses, including, for example, the number of cars available to steal. They reported that the subsequent opportunity-standardized rates were very different from the traditional population-standardized crime (or victimization) rates. Opportunity-standardized rates may sometimes differ even in direction. For example, rather than showing the rate of motor vehicle–related theft increasing, a corrected rate showed it to be decreasing (Wilkins; Sparks). Ultimately, of course, much more precise victimization rates are available from victimization survey data.

Finally, the total population base may be used incorrectly if the decennial Bureau of the Census counts of the population are not adjusted for projected population estimates on a yearly basis. For example, if the 1990 census data are used in the base to calculate 1999 crime rates, the rates will be artificially inflated simply as a consequence of using too small a population base. Obviously, 1999 population estimates are more appropriate in the calculation of 1999 crime rates.

Overall, the data presented in the UCR are ‘‘representative.’’ However, the greatest threat to the validity of these statistics is differential reporting to the F.B.I. by local police, within participating departments, and to local authorities by citizen victims or other complainants. There is underreporting both by and to the police.

The reports of participating law enforcement agencies to the F.B.I. can be affected in a variety of ways, leading to variations in the uniformity, accuracy, and completeness of reporting. In spite of efforts to standardize definitions of eligible offenses, police in different states with different statutory and operational definitions of offense categories may classify crimes differently. There may be internal and external pressures on police agencies to demonstrate reductions in community crime or specific targeted offenses, and these pressures may induce police to alter classification, recording, and reporting procedures. Such changes can have a dramatic impact on the amount and rate of crime. A classic example was the reorganization of the Chicago police department. As part of the reorganization, more efficient reporting and recording procedures were introduced, and reported crime increased dramatically from 57,000 offenses in one year to 130,000 in the next (Rae).

To make the problems with the UCR even more complicated, the reported statistics can vary across time and place as policies change, police technology becomes more sophisticated, laws and statutes are modified, commitment to the program wavers, demands for information change, available resources fluctuate, and so on (Hindelang). Unfortunately, even if all the difficulties of validity, reliability, and comparability were eliminated and the statistics became completely and uniformly accurate, there would remain the more serious problem of differential reporting to the police by victims and other citizens. There is evidence of substantial underreporting and nonreporting to the police by victims of crime; in fact, the majority of crimes committed are not reported to the police.

The assumption of the originators of the UCR that there is a constant ratio between crimes known to the police and crimes committed has been shown to be fallacious by studies using unofficial crime statistics. One may never know the actual volume of crimes committed, and therefore the true base remains indeterminate. But more importantly, underreporting or nonreporting to the police varies by offense type, victim and offender characteristics, perceptions of police efficiency, and the like. In short, the dark figure of undetected and unreported crime limits the adequacy of even the historically preferred crimes-known-to-the-police index of the amount and distribution of crime.

During the 1980s, law enforcement agencies sought to improve official reporting methods, particularly the UCR. In 1985 the Bureau of Justice Statistics (BJS) and the F.B.I. released Blueprint for the Future of the Uniform Crime Reporting Program (Reaves, p. 1). This blueprint outlined the next generation of official reporting methods, specifically the National Incidence Based Reporting System (NIBRS). Starting with 1991 data, the UCR program began to move to this more comprehensive reporting system. While the UCR is essentially offender-based, focusing on summary accounts of case and offender characteristics, the NIBRS is incident-based, seeking to link more expansive data elements to the crime, included in six primary categories: administrative, offense, property, victim, offender, and arrestee (Reaves, p. 2).

The first segment, the administrative, is a linking segment that provides an identifier for each incident. Further, this segment provides the date and time of the original incident as well as any required special clearance for the case. The second segment, the offense, details the nature of the offense(s) reported. Unlike the UCR, which is limited to a relatively small number of F.B.I. index crimes, NIBRS provides details on fortysix offenses. This specificity allows for more accurate reporting of the offense, as well as improved ability to analyze other characteristics of the crime. The offense category also examines conditions surrounding the event, such as drug or alcohol involvement at the time of the incident, what type of weapon, if any, was used, and whether or not the crime was completed. Segment three deals with the property aspects of the incident, such as the nature of the property loss (i.e., burned, seized, stolen), the type of property involved (i.e., cash, car, jewelry), the value of the property, and if the property was recovered and when. The fourth segment, victim, lists the characteristics of the individual victimized in the incident. The victim’s sex, age, race, ethnicity, and resident status are presented and, in cases where the victim is not an individual, additional codes for business, government, and religious organizations are provided. Each of the victims is linked to the offender, by the offender number and by the relationship between the victim and the offender. Segment five focuses on the individual attributes of the offender rather than the victim. The final segment, arrestee, gives information on those arrested for the incident—the date/time of the arrest, how the arrest was accomplished, whether or not the arrestee was armed, and age, gender, race, ethnicity, and residence status of the arrestee (Reaves).

Data collection for NIBRS follows a process similar to that of the UCR, with local agencies reporting to the state program, which passes the information along to the F.B.I. However, one major change exists for those states desiring to participate in the NIBRS program—to begin regular submission of NIBRS data a state must be certified by the F.B.I. (Roberts, p. 7). The state must meet the following four criteria before becoming certified: First, the state must have an error-reporting rate of no greater than 4 percent for three consecutive months. Second, the data must be statistically reasonable based on trends, volumes, and monthly fluctuations. Third, the state must show the ability to update and respond to changes within the system. And finally, the state NIBRS program must be systematically compatible with the national program.

Calls for Service

Another method by which crime may be monitored utilizes emergency calls to the police. Some of the criticisms that have been leveled at arrest records (e.g., that they measure reactions to crimes rather than criminal involvement) and at victimization surveys (e.g., there may be systematic bias in the willingness of victims to report certain crimes to interviewers) may be addressed by this method of crime measurement.

It is suggested that the primary advantage of measuring crime through calls-for-service (CFS) is that it places the data closer to the actual incident. This removes additional layers in which bias or data loss can occur. For example, in order for a crime to be recorded as an arrest, the police must respond to the call, investigate the crime, find and arrest a suspect. At any one of these steps the process can be halted and nothing recorded, hiding the occurrence of the crime and contributing to the ‘‘dark figure.’’ Similarly, within victimization surveys the respondent may forget events in the past, or the victim may choose not to give accurate information due to the sensitive nature of the crime. By placing the data gathering at the point of actually reporting the event to authorities means that the reports are ‘‘virtually unscreened and therefore not susceptible to police biases’’(Warner and Pierce, p. 496).

As potentially valuable as these calls for service are, several weaknesses exist that create difficulty in utilizing them as crime measures. The first relates to the coding of a call to the police. Not all calls for police service focus on crime or legal issues—many are calls for medical or physical assistance, general emergencies, information requests, or ‘‘crank’’ calls. Clearly, they do not measure crime. At first glance, these appear easy to filter out of the data. But there are ambiguities; for example, how are medical conditions brought on by illegal activity (an individual consumes an illegal substance and has a negative reaction) coded? The call to 911 requests medical assistance but fails to mention the origin of the medical distress. The code for this event appears to be medical but in fact also represents a crime. Another concern surrounding the coding of a call has to do with the accuracy of what the individual is reporting to the operator. Not only can this lead to problems in the general categorization of a call as a crime (or not), but also in the specific crime being reported. The caller may not understand the nature of the event they are witness to, or the caller may desire a faster response from the police and inflate the severity of the offense. Further, even if citizens have a sound understanding of the legal nature of the event, they may be unable to articulate critical features of the event (e.g., high levels of anxiety, English as a second language).

A second major issue with CFS as a measure of crime is that many crimes come to the attention of the police by methods other than phone calls to the police department. Officers can observe crimes while patrolling or information can be directly presented to officers by citizens while on patrol or at station houses (Reiss). This creates a new aspect of the ‘‘dark figure’’—reliance on CFS may tend to undercount crimes as police officers discover criminal activity through other means. Data collected by Klinger and Bridges found that officers sent out on calls encountered more crime than reflected in the initial coding of the calls (Klinger and Bridges, p. 719).

A final consideration in using CFS as a crime measure is that calls tend to vary according to the structural characteristics of the neighborhood. Where residents believe police respond slowly to their calls, where residents are more fearful of crime, and where there exists more criminal victimization, there will be systematic variation in the presentation of calls for service (Klinger and Bridges). For example, in high crime rate areas, residents may be sensitized to crime and report all behavior, resulting in many false positives. Thus, differences in CFS across communities may introduce additional sources of error.

Utilization of CFS data is an innovative approach to the measurement of crime. However, until the strengths and weaknesses of these records of initial calls to the police for a variety of services have been scrutinized to the same degree as the more traditional measures, some caution should be exercised with this ‘‘new’’ indicator of crime.

Accessing Official Reports

One of the most significant changes in using crime data over the past decade reflects the means of access to official crime statistics. Traditionally, in order to gather official crime figures one has either to rely on published documents (e.g., the annual UCR) or to contact the agency (federal, state, or local) that is the repository of the data and request access to the appropriate information. With the expansion of the Internet, many of these same agencies have placed their crime data online.

At the federal level, one example among many is the Bureau of Justice Statistics (www.ojp.usdoj.gov/bjs/). The BJS provides aggregate level data for the United States, in both absolute figures and rates for index and nonindex crimes. The BJS also provides data by region. Thus information, such as the UCR, can be accessed directly and within a format that allows for easy comparison between regions and over time.

Likewise, various state agencies have started to provide crime and criminal justice information on their web sites. For examples, the California Attorney General (www.caag.state.ca.us/) provides information on crimes within the state from 1988 until the present and presents comparative information on other states over the same period of time. The Texas Department of Criminal Justice (www.tdcj.state.tx.us/) not only provides information on crime rates within the state, but also gives statistics on demographic and offense characteristics of prisoners, including those on death row.

Even local agencies, such as city police departments, provide crime statistics. For examples, the San Diego Police Department (www.sannet.gov/police) provides a breakdown of total aggregate crime citywide, the respective rates of each crime, and the geographic distribution of crime citywide and for specific areas within the city. The Dallas Police Department (www.ci.dallas.tx.us/dpd/) presents a map of the city divided into ‘‘reporting areas’’ that allows the viewer to select an area in which they are specifically interested and to gather the relevant crime data.

The expansion of the Internet and its utilization by law enforcement agencies facilitate access to criminal justice data sources and statistics within minutes rather than days or weeks. The information is provided typically within a spreadsheet format or simple tables. This makes the utilization of information on crime much easier, for law enforcement, researchers, and the public.

Applications of Official Data

A number of computerized information and data management systems have been created to facilitate both the apprehension of offenders and research on crime. They are typically local or regional efforts, providing law enforcement agencies in particular the capacity to store, manage, and utilize individual-level, comprehensive record information on case characteristics, offenders, victims, tips, crime locations, and so on. The goal, in increasing the quantity and quality of information available to law enforcement agencies, is to enhance the effectiveness and efficiency of the criminal justice system. One example of this type of data system is the Homicide Investigation and Tracking System (HITS).

The HITS program was originally funded under a National Institute of Justice (NIJ) grant that sought to examine homicides and their investigations within Washington State (Keppel and Weis). A computerized information system, which included all homicides in the state from 1980 forward, was created to facilitate the examination of solvability factors in homicide investigations, as well as to provide a comprehensive, ongoing database to be used by investigators to inform and enhance their case investigations. This was accomplished by having law enforcement officers fill out a standardized case form, which contains hundreds of pieces of information, on the victim(s), offender(s), locations, time line, motives, cause of death, autopsy results, evidence, and so on. In effect, a digitized version of the most relevant features of a case file were input to the database.

The HITS program contains information from six major sources and is stored in seven different data files: murder, sexual assault, preliminary information, Department of Corrections, gang-related crimes, Violent Criminal Apprehension Program, and timeline. These data files can then be queried by the investigator for a wide range of information, such as victims’ gender, race, and lifestyle; date and cause of death; location of body; and other similar characteristics. This allows investigators to make their search as wide or narrow as the case demands, in order to improve their ability to focus on an offender.

HITS also provides an excellent source of information for researchers. The HITS program provides initial official reports of crimes that are generated in close temporal proximity to the crime. Also, because the HITS program maintains separate databases of information provided by public sources (e.g., licensing, corrections, motor vehicles), researchers can link separate sources of information on the same case, improving the analysis of the crime. The query system also allows researchers to create aggregate data sets, adjusting for a range of variables such as time of year or day, location within state, mobility of offenders, and so on. The HITS system, and others like it, then, not only improve the ability of law enforcement to solve crimes but also the ability of researchers to analyze them.

Unofficial Crime Statistics

Even though most of the fundamental problems with official crime statistics had been identified before the end of the nineteenth century, including the major problem of the dark figure of unknown crime, it was not until the mid-twentieth century that systematic attempts to unravel some of the mysteries of official statistics were initiated. Turning to data sources outside of the official agencies of criminal justice, unofficial crime statistics were generated in order to explore the dark figure of crime that did not become known to the police, to create measures of crime that were independent of the official registrars of crime and crime control, and to address more general validity and reliability issues in the measurement of crime.

There are two categories of unofficial data sources: social-science and private-agency records. The first of these is much more important and useful. Among the social-science sources, there are two major, significant measures, both utilizing survey methods. The first is self-reports of criminal involvement, which were initially used in the 1940s to ‘‘expose’’ the amount of hidden crime. The second is surveys of victimization, the most recent and probably the most important and influential of the unofficial crime statistics. Victimization surveys were initiated in the mid-1960s to ‘‘illuminate’’—that is, to specify rather than simply to expose—the dark figure and to depict crime from the victim’s perspective. There are also two minor, much less significant sources of social-science data: observation studies of crime or criminal justice, and experiments on deviant behavior. Among the sources of private agency records are those compiled by firms or industries to monitor property losses, injuries, or claims; by private security organizations; and by national trade associations. The focus here will be on the social-science sources of unofficial crime statistics, particularly victimization and self-report surveys.

Victimization Surveys

Recognizing the inadequacies of official measures of crime, particularly the apparently substantial volume of crime and victimization that remains unknown to, and therefore unacted upon by, criminal justice authorities, the President’s Commission on Law Enforcement and Administration of Justice initiated relatively small-scale pilot victimization surveys in 1966. One was conducted in Washington, D.C. A sample of police precincts with medium and high crime rates was selected, within which interviews were conducted with residents. Respondents were asked whether in the past year they had been a victim of any of the index crimes and of assorted nonindex crimes. Another surveyed business establishments in the same precincts in Washington, D.C., as well as businesses and residents in a sample of high-crime-rate precincts in Boston and Chicago. The instruments and procedures used in the first pilot survey were modified and used to interview residents in the second study. Owners and managers of businesses were asked whether their organization had been victimized by burglary, robbery, or shoplifting during the past year. The third pilot survey was a national poll of a representative sample of ten thousand households. Again, respondents were interviewed and asked whether they or anyone living with them had been a victim of index and nonindex crimes during the past year. They were also asked their opinions regarding the police and the perceived personal risk of being victimized.

The pilot studies verified empirically what criminologists had known intuitively since the early nineteenth century—that official crime statistics, even of crimes known to the police, underestimate the actual amount of crime. However, these victimization studies showed that the dark figure of hidden crime was substantially larger than expected. In the Washington, D.C., study, the ratio of reported total victim incidents to crimes known to the police was more than twenty to one (Biderman). This dramatic ratio of hidden victimizations to reported crimes was replicated among the individual victims in the Boston and Chicago study (Reiss) and in the national pilot survey, which showed that about half of the victimizations were not reported to the police (Ennis). The survey of business establishments discovered the inadequacy of business records as measures of crime, showed higher rates of victimization than police records indicated, and verified the valid reporting of business victimization by respondents (Reiss). These studies also demonstrated that the discrepancy between the number of victimizations and of crimes reported to the police varies importantly by the type of offense and by the victim’s belief that reporting a crime will have consequences. In general, the more serious the crime, the more likely a victim is to report it to the police; minor property crimes are reported least frequently. As a result of the startling findings of these pilot victimization surveys and of the subsequent recommendations of the President’s Commission, an annual national victimization was initiated in the National Crime Victim Survey (NCVS).

In 1972 the United States became one of the few countries to carry out annual national victimization surveys. The NCVS is sponsored by the Bureau of Justice Statistics (within the United States Department of Justice) and is conducted by the Bureau of the Census. Its primary purpose is to ‘‘measure the annual change in crime incidents for a limited set of major crimes and to characterize some of the socioeconomic aspects of both the reported events and their victims’’ (Penick and Owens, p. 220). In short, the survey is designed to measure the amount, distribution, and trends of victimization and, therefore, of crime.

The survey covers a representative national sample of approximately sixty thousand households, and through 1976 it included a probability sample of some fifteen thousand business establishments. Within each household, all occupants fourteen years of age or older are interviewed, and information on twelve- and thirteen-year-old occupants is gathered from an older occupant. Interviews are conducted every six months for three years, after which a different household is interviewed, in a constant process of sample entry and replacement.

The crimes measured in the NCVS are personal crimes (rape, robbery, assault, and theft), household crimes (burglary, larceny, and motor vehicle theft), and business crimes (burglary and robbery). These crimes were selected intentionally for their similarity to the index crimes of the UCR in order to permit important comparisons between the two data sets. The only two index crimes missing are murder, for which no victim can report victimization, and arson, the ostensible victim of which is often the perpetrator.

The statistics on victimization generated by the NCVS provide an extremely important additional perspective on crime in the United States. Ever since they were first published, the survey’s reports have forced a revision in thinking about crime. For example, a report on victimization in eight American cities, using data from the very first surveys, provided striking confirmation of the magnitude of the underreporting and nonreporting problem identified in the pilot projects. Comparing the rates of victimization and crimes known to the police, the victimization data showed fifteen times as much assault, nine times more robbery, seven times the amount of rape, and, surprisingly, five times more motor-vehicle theft than reported in the UCR for the same period (U.S. Department of Justice).

Some of the discrepancy in the two rates can be accounted for by the practices of the police— not viewing a reported offense as a crime, failing to react, and not counting and recording it. But since the time of the pilot research it has been clear that the major reason for the discrepancy is the reporting practices of victims: the pilot national survey reported that approximately 50 percent of victimizations are not reported to the police (Ennis). An analysis of preliminary data from the first NCVS in 1973 concluded that nonreporting by victims accounted for much more of the difference between victimization and official crime rates than did nonrecording by the police. Almost three-fourths (72 percent) of the crime incidents are not reported to the police, ranging from a nonreporting rate of 32 percent for motor-vehicle theft to a rate of 82 percent for larceny (Skogan).

The primary reasons for citizen hesitancy to report crime to the police are relatively clear— the victim does not believe that reporting will make any difference (35 percent) or that the crime is not serious enough to bring to the attention of authorities (31 percent) (U.S. Bureau of the Census). The less serious crimes, particularly minor property crimes, are less often reported to the police, and the more serious ones are reported more often. Paradoxically, some of the more serious personal crimes, including aggravated assault or rape, are not reported because a personal relationship between victim and perpetrator is being protected or is the source of potential retribution and further harm (Hindelang, Gottfredson, and Garofalo). Another crime, arson, presents the problems of potential overreporting and of distinguishing between victim and perpetrator, since collecting insurance money is often the motive in burning one’s own property.

The NCVS does not merely provide another national index of crime, a view of crime from the perspective of the victim, and illumination of the dark figure of hidden crime. It has also contributed to a better understanding of crime in the United States, forcing scholars and criminal justice professionals alike to question many basic assumptions about crime. Perhaps most perplexing are the implications of the victimization trend data. From 1973 to the 1990s, the overall victimization rate remained relatively stable from year to year, whereas the UCR showed a more inconsistent and upward trend. It was not until the observed decline in crime from about 1992 until 2001, that the UCR and NCVS both showed the same trend, due perhaps to refinements in both systems. However, there are a number of possible interpretations for the differences, centering on the relative strengths and weaknesses of official records of crimes known to the police, as compared to unofficial victim reports.

In general, victimization surveys have the same problems and threats to validity and reliability as any other social-science survey, as well as some that are specific to the NCVS. Ironically, there is a ‘‘double dark figure’’ of hidden crime— crime that is not reported to interviewers in victimization surveys designed to uncover crimes not reported to the police! Such incomplete reporting of victimization means that victimization surveys, like official data sources, also underestimate the true amount of crime. Of course, this suggests that the discrepancy between the crime rate estimates of the NCVS and of the UCR is even larger than reports indicate.

A number of factors contribute to this doubly dark figure of unreported victims. One of the most difficult problems in victimization surveys is to anchor the reported crime within the six-month response frame. A respondent not only has to remember the crime incident, but must also specify when it took place during the past six months. Memory may be faulty the longer the period of time between the crime and interview; the more likely is memory to fail a respondent who either forgets an incident completely or does not remember some important details about the victimization. The less serious and more common offenses are less worth remembering because of their more trivial nature and ephemeral consequences. The concern and tolerance levels of victims may also affect their recollection of crime incidents. Moreover, telescoping may take place: the victimization may be moved forward or backward in time, from one period to another. A victim knows that a crime took place but cannot recall precisely when. Another source of inaccurate and inconsistent responses is deceit. Some respondents may simply lie, or at least shade their answers. There are many reasons for deceit, including embarrassment, social desirability (wish to make a socially desirable response), interviewer-respondent mistrust, personal aggrandizement, attempts to protect the perpetrator, disinterest, and lack of motivation. Memory decay and telescoping are neither intentional nor manipulative, and are therefore more random in their effects on responses. They are likely to contribute to the underestimation of victimization. However, deceit is intentional and manipulative, and it is more likely to characterize the responses of those who have a reason to hide or reveal something. The effects on victimization estimates are more unpredictable because deceit may lead to underreporting among respondents but to overreporting among others. One can assess the extent of underreporting through devices such as a ‘‘reverse record check,’’ by means of which respondents who have reported crimes to the police are included in the survey sample (Turner; Hindelang, Hirschi, and Weis, 1981). Comparing a respondent’s crime incidents reported in the victimization interview with those reported to the police provides a measure of underreporting. A problem, though, is that underreporting can be validated more easily than overreporting. One ‘‘underreports’’ crimes that actually took place. For every official crime known to the police of a particular offense category, one can be relatively certain of underreporting if no victimization is reported for that offense category. If more victimizations are reported for an offense category than are known to the police, one cannot know whether the respondent is overreporting. A person may ‘‘overreport’’ crimes that never took place—they cannot be known, verified, or validated.

One of the strengths of the NCVS, namely, that the crimes included in the questionnaire are F.B.I. index crimes, is also a problem. In addition to the fact that two of the index crimes (murder and arson) are not included, many other important crimes are not measured in the victimization surveys. Obviously, the whole array of crimes without victims are excluded, as well as the nonindex crimes and crimes not included in the UCR program. The result is that the victimization statistics are somewhat limited in their representativeness and generalizability.

An important limitation of the design of the NCVS is a strength of the UCR—its almost complete coverage (98 percent) of the total United States population and the resultant ability to examine the geographic and ecological distribution of crime from the national level to the levels of regions, states, counties, Standard Metropolitan Statistical Areas, cities, and local communities. Historically, data on victimization have been collected from a sample of the population, which has varied around 100,000 respondents, distributed geographically throughout the United States. There are simply not enough data to generate meaningful and useful statistics for each of the geographic and ecological units represented in the UCR. This would require a comprehensive census of households, the cost of which would be prohibitive.

Another design problem is referred to as bounding, or the time frame used as the reference period in interviews, which is established at the first interview on a six-month cycle for ‘‘household location.’’ This is done to fix the empirically determined optimum recall period of six months and to avoid double reporting of the same crime incident by respondents. The bounding of household locations rather than of the occupants of the household has also been a problem. If the occupants move, the new occupants are not bounded, and it has been estimated that about 10 to 15 percent of the sample consists of unbounded households. This factor, coupled with the mobility of the sample, creates a related problem: complete data records covering the three-year span of each panel are available for perhaps only 20 percent of the respondents. This restricts general data analysis possibilities, particularly the feasibility and utility of these data for longitudinal analyses of victimization experiences (Fienberg).

Finally, there are the inevitable counting problems: When there is more than one perpetrator involved in a crime, it is particularly difficult for respondents to report the number of victimizers with accuracy. The typical impersonality of a household burglary makes it impossible for a victim to know the number or characteristics of the burglars. Even as personal a crime as aggravated assault often presents the victim with problems in accurately recalling his perceptions when more than one person attempted or did physical injury to his body. The respondent’s reports, then, may be less accurate when the perpetrator could be seen or when there was more than one observable perpetrator. If a respondent reports multiple victimizers in a crime incident, whether a property crime or violent crime, it counts as one victimization—the general counting rule is ‘‘one victim, one crime.’’ By itself this is not necessarily problematic, but if one compares victimization rates and official crime rates for property offenses (for which the UCR counting rule is ‘‘one incident, one crime’’), there may be sufficient noncomparability of units to jeopardize the validity of the comparison. For example, a three-victim larceny would yield three reports of victimization but only one crime known to the police. A three-victim assault would yield three of each and present fewer problems of comparability. The perspectives of the victim and the police are different, as are those of the NCVS and the UCR in counting and recording crime incidents with different statistical outcomes and interpretations.

A more serious counting problem involves series victimizations or rapid, repeated similar victimization of an individual. For a victim, it can be very difficult to separate one crime from another if they are very similar and happen within a compressed time period. The consequence is that validity suffers and there is a tendency to ‘‘blur’’ the incidents and to further underestimate the number of victimizations. The questionnaire separates single and series incidents, which are defined as three or more similar crimes that the respondent cannot differentiate in time or place of occurrence. Early publications of the NCVS excluded these series victimizations from the published victimization rates, raising the possibility that the rates are underestimations. Even more of the dark figure of hidden crime might be illuminated. If this and other problems with victimization surveys are resolved, the discrepancy between the amount of crime committed and the amount eventually reported to the police may become more substantial. There is little evidence that victims (except those of forcible rape) are changing their patterns of reporting crimes to the police, but there is mounting and more rigorous evidence that our ability to measure the amount and distribution of the dark figure of unreported crime is improving.

Self-Report Surveys

Surveys of self-reported criminal involvement are an important part of the improved capacity to illuminate the dark figure, in this case from the perspective of the criminal (or victimizer). The origin of self-report surveys predated victimization surveys by more than twenty years. Preliminary, groundbreaking research on self-reported hidden crime was conducted in the 1940s, but the method of simply asking someone about the nature and extent of his own criminal involvement did not become a recognized standard procedure until the late 1950s, with the work of James Short and Ivan Nye.

Austin Porterfield first used this variation of the survey research method of ‘‘self-disclosure’’ by a respondent in 1946, to compare the selfreports of college students regarding their delinquent activities while they were in high school with self-reports of delinquents processed through the juvenile court. Not only were more offenses admitted than detected, but also more significantly, it appeared that the college students had been involved in delinquency during their adolescence in ways similar to those of the officially defined delinquents. These findings suggested that the distinction between delinquent and nondelinquent was not dichotomous, but rather more continuous, and that crime was perhaps distributed more evenly in the American social structure than official statistics would suggest. Fred Murphy, Mary Shirley, and Helen Witmer reported in 1946 that the admissions of delinquent activities by boys who participated in a delinquency prevention experiment significantly surpassed the number of offenses that came to the attention of juvenile authorities. James Wallerstein and Clement Wyle conducted a study that remains unique in self-report research because it surveyed a sample of adults in 1947. They discovered that more than 90 percent of their sample of about fifteen hundred upper-income ‘‘law-abiding’’ adults admitted having committed at least one of forty-nine crimes included in the questionnaire.

These early self-report survey findings confirm empirically what criminal statisticians, law enforcement authorities, and even the public had known since the time of Quetelet—that a substantial volume of crime never comes to the attention of the criminal justice system. The hint that some of this invisible crime is committed by persons who are not usually considered candidates for official recognition as criminals was even more revelatory and intriguing, but remained dormant for a decade.

Heeding suggestions that criminology needed a ‘‘Kinsey Report’’ on juvenile delinquency, Short and Nye in 1957 developed an anonymous, self-administered questionnaire that contained a checklist of delinquent acts, which was administered to populations of students and incarcerated delinquents. Their research had a more profound and longer-lasting impact because it was tied to theory-testing and construction (Nye) and, more importantly, because it provocatively verified the hint only alluded to in the earlier self report-studies—that crime is not disproportionately a phenomenon of the poor, as suggested by official crime statistics. The selfreport data were apparently discrepant with the official data because they showed that selfreported delinquency was more evenly distributed across the socioeconomic status scale than official delinquency. This one provocative finding called into question the correlates and theories of juvenile delinquency and crime because most were based on official crime statistics and that period’s depiction of crime and delinquency as a phenomenon of the poor. The controversy set off by the work of Short and Nye still continues. Literally hundreds of similar studies have been carried out since Short and Nye’s pioneering work, most with similar results: there is an enormous amount of self-reported crime that never comes to the attention of the police; a minority of offenders commits a majority of the offenses, including the more serious crimes; the more frequently one commits crimes, the more likely is the commission of serious crimes; and those most frequently and seriously involved in crime are most likely to have official records. Self-report researchers have tended to assume that self-reports are valid and reliable—certainly more so than official measures. Ever since the mid-1960s, work critical of criminal justice agencies and of official crime statistics generated further support for these assumptions. A few theorists, such as Travis Hirschi, even constructed and tested delinquency theories based on selfreport measures and their results.

It has been suggested, ‘‘confessional data are at least as weak as the official statistics they were supposed to improve upon’’ (Nettler, p. 107). This criticism is damning to the extent that the statistics produced by the self-report method and official statistics are valid and reliable measures of crime: if one rejects official statistics, then one should also question the adequacy of self-report statistics. Furthermore, as with official records and victimization surveys, there are a number of problems with the self-report method. Some of these are problems shared by victimization surveys and self-report surveys, and others are unique to the latter. The shared problems are the basic threats to the validity and reliability of responses to survey questions, including memory decay, telescoping, deceit, social-desirability response effects, and imprecise bounding of reference periods. The unique problems fall into four categories: inadequate or unrepresentative samples, unrepresentative domains of behavior measured, validity and reliability of the method, and methods effects.

Whereas the national victimization surveys cannot provide refined geographical and ecological data because of the dispersion of the probability samples across the United States, selfreport surveys have other problems of representativeness and generalizability because they do not typically use national samples. Practically all self-report research is conducted with small samples of juveniles attending public schools in a community that, characteristically, is relatively small, often suburban or rural, and modally middle-class and white. This, of course, restricts the ability to generalize beyond these kinds of sample characteristics to adults, juveniles who are not in school, those who refuse to participate, urban inner-city juveniles, and poor and nonwhite youngsters. Such ‘‘convenience’’ samples also create analytic problems because data on those variables that are correlated with delinquency are simply unavailable or underrepresented in the sample. In short, most selfreport research has somewhat limited generalizability because of typical sample characteristics. On the other hand, unlike the NCVS or UCR, self-report surveys were not intended originally to produce national or even generalizable estimates of the amount of juvenile delinquency crime in the United States.

Self-report surveys were intended, however, to produce data on a variety of delinquent behaviors. Compared to the restricted range of index crimes included in the NCVS, the domain of behavior measured in self-report surveys is expansive, with as many as fifty illegal acts in a questionnaire not being uncommon. Such expansiveness, however, creates other problems. Historically, the juvenile court has had jurisdiction over both crimes and offenses that are illegal only for juveniles, usually referred to as status offenses and including truancy, incorrigibility, curfew violation, smoking, and drinking. Self-report surveys have correctly covered crimes and status offenses alike in studying juvenile delinquency, but in some cases there has been an overemphasis on the less serious offenses. To the extent that there is an overrepresentation of less serious and perhaps trivial offenses, self-report measures are inadequate measures of the kind of serious juvenile crime that is likely to come to the attention of authorities. This is important in describing accurately the characteristics of juvenile offenders and their behavior, as well as in comparing selfreport and official data. Such comparison is crucial to validation research, where one needs to compare the same categories of behavior, including both content and seriousness, in order to assess the reciprocal validity of self-report and official measures. In criminology as elsewhere, one should not compare apples with oranges!

Unfortunately, there has been a dearth of this kind of careful validation research, as well as of systematic research on reliability. The accuracy and consistency of self-report surveys have been assumed to be quite acceptable, or, if questions have been posed, they have typically come from validity and reliability research on general social-science survey methods. For example, it has been assumed that anonymous surveys are more valid than signed surveys and that interviews are preferred over self-administered questionnaires. Yet no study had directly compared the validity, and only one had compared the reliability, of two or more self-report methods within the same study until the work of Michael Hindelang, Travis Hirschi, and Joseph Weis in 1981. In those isolated studies where validity and reliability were addressed, external validation criteria such as official record data have been used too infrequently.

Of course, critics have remained skeptical about the accuracy of responses from liars, cheaters, and thieves, as well as from straight and honorable persons. The latter are not motivated by deception or guile, but they may respond incorrectly because a questionnaire item has poor face validity meaning that it does not make sufficiently clear what is being asked and that the respondent is consequently more free to interpret, construe, and attribute whatever is within his experience and imagination. For example, a common self-report item, ‘‘Have you ever taken anything from anyone under threat of force?’’ is intended to tap instances of robbery. However, respondents might answer affirmatively if they had ever appropriated a candy bar from their kid sister. This problem of item meaning and interpretation is chronic in survey research, but it only remains problematic if no validation research is undertaken to establish face validity. Unfortunately, this has been the case in the development of self-report instruments.

There has been a basic inattention to the psychometric properties of self-report surveys and attendant methods effects on measurement. From the psychometric research that went into the development of the NCVS, it is clearer that the bounding practices in self-report research have been inadequate: the reference periods are typically too long and variable from study to study. Most self-report surveys ask whether a respondent has ‘‘ever’’ committed a crime, a few use the past ‘‘three years,’’ some use the past ‘‘year,’’ but very few use the past ‘‘six months’’ (or less), which was established as the optimum recall period for the national victimization surveys. This poses threats to the accuracy of responses since it is established that the longer the reference period, the more problems with memory decay, telescoping, and misinterpretation of events.

A related problem arises when the selfreport researcher wants to find out how often within a specified period a respondent has committed a crime. A favored means of measuring the frequency of involvement has been the ‘‘normative’’ response category. A respondent is asked, ‘‘How often in the past year have you hit a teacher?’’ and is given a set of response categories that includes ‘‘Very often,’’ ‘‘Often,’’ ‘‘Sometimes,’’ ‘‘Rarely,’’ and ‘‘Never.’’ One respondent can check ‘‘Rarely’’ and mean five times, whereas another can check the same response and mean one time. They each respond according to personal norms, which are tied to their own behavior, as well as to that of their peers. This creates analytic problems because one cannot norm (that is, accurately compare) the answers of each respondent, obviating meaningful comparisons. A great deal of information is thus lost. Simply asking each respondent to record the actual frequency of commission for each offense can solve these problems.

Finally, unlike the NCVS and the UCR, there is very little about self-report surveys— whether their samples, instruments, or procedures—that is ‘‘standardized.’’ This restricts the kinds of comparison across self-report studies that could lead to more improvements in the method and provide a more solid empirical foundation for theory construction and testing, as well as the possibility of nationwide self-report statistics comparable to those of the NCVS and the UCR.

This lack of standardization, inadequacies of samples, and the question of the differential validity and reliability of self-report and official measures of crime have led to two important developments in the research of crime statistics. The first is the initiation of surveys of national representative samples of juveniles for the purpose of estimating the extent and nature of delinquency and of substance abuse in the United States. The second is the conducting of more rigorous and comprehensive research on the differential validity and reliability of official, as compared to self-report, measures of crime and delinquency. In 1967, the National Institute of Mental Health initiated the first of an interrupted but relatively regular series of National Youth Surveys of a representative sample of teenage youths, who were interviewed about a variety of attitudes and behaviors, including delinquent behavior. This survey was repeated in 1972, and in 1976 the National Institute for Juvenile Justice and Delinquency Prevention became a cosponsor of what has become an annual self-report survey of the delinquent behavior of a national probability panel of youths aged from eleven to seventeen years. The two major goals are to measure the amount and distribution of selfreported delinquent behavior and official delinquency and to account for any observed changes in juvenile delinquency.

These periodic national self-report surveys allow more rigorous estimation of the nature and extent of delinquent behavior. It is ironic, however, that the validity, reliability, and viability of the self-report method as an alternative or adjunct to official measures was not assessed rigorously until Hindelang, Hirschi, and Weis began a study of measurement issues in delinquency research, focusing on the comparative utility of self-report and official data.

Within an experimental design, a comprehensive self-report instrument was administered to a random sample of sixteen hundred youths from fourteen to eighteen years of age, stratified by sex, race (white or black), socioeconomic status (high or low), and delinquency status (nondelinquent, police contact, or court record). Officially defined delinquents, boys, blacks, and lower-socioeconomic-status subjects were oversampled in order to facilitate data analysis within those groups that are often underrepresented in self-report studies. Subjects were randomly assigned to one of four test conditions that corresponded to four self-report methods of administration: anonymous questionnaire, signed questionnaire, face-to-face interview, and blind interview. A number of validation criteria were utilized, including the official records of those subjects identified in a reverse record check, a subset of questions administered by the randomized response method, a deep-probe interview for face validity testing a subset of delinquency items, and a follow-tip interview with a psychological-stress evaluator to determine the veracity of responses. The subjects were brought to a field office, where they answered the questions within the method condition to which they were randomly assigned. This experimental design, coupled with a variety of external validation criteria and reliability checks, ensures that the findings and conclusions can be drawn with some confidence—undoubtedly with more confidence than in any prior research on validity and reliability in the measurement of delinquency.

Hindelang, Hirschi, and Weis’s study produced a variety of findings on the whole range of previously identified methodological problems and issues. Official crime statistics, it concluded, generate valid indications of the sociodemographic distribution of delinquency. Self-reports, indeed, measure a domain of delinquent behavior that does not overlap significantly with the domain covered by official data, particularly for the more serious crimes. However, self-reports can measure the same type and seriousness of delinquent behaviors as are represented in official records. Within the domain of delinquent behavior that they do measure, selfreports are very reliable, and basically valid. Selfreport samples have been inadequate in that they do not include enough officially defined delinquents, nonwhites, and lower-class youths to enable confident conclusions to be drawn regarding the correlates of the more serious delinquent acts for which a juvenile is more likely to acquire an official record. Delinquency, whether measured by official or self-report data, is not equally distributed among all segments of society—there are real differences between those youngsters who engage in crime and those who do not. Methods of administration have no significant effects on the prevalence, incidence, validity, or reliability of self-reports. There is apparently less validity in the self-reports of those respondents with the highest rates of delinquency—male, black, officially defined delinquents. Perhaps the most significant finding of the research is related to this finding of differential validity for a small subpopulation of respondents. As originally proposed by Hindelang, Hirschi, and Weis in 1979, the empirical evidence shows that there is no discrepancy in the major correlates of self-reported or official delinquency, except for race, which may be attributable to the less valid responses of black subjects, particularly males with official records of delinquency.

The finding that self-reports and official measures do not produce discrepant results regarding the distribution and correlates of delinquency, but rather show convergence, is a critical piece of evidence in the controversy that has existed among criminal statisticians since the dark figure was identified at the beginning of the nineteenth century. Does the distribution of crime look the same when those crimes not known to the police are included in the overall distribution and with the distribution of crimes known to the police? Are the different sources of crime statistics producing discrepant or convergent perspectives of crime?

Conclusion: Discrepancy or Convergence?

Returning to the two primary purposes of crime statistics, to measure the ‘‘amount’’ and ‘‘distribution’’ of crime, it is clear that there has been, and will probably continue to be, discrepancy among the estimates of the amount of crime that are generated by the variety of crime statistics. The dark figure of crime may never be completely illuminated, the reporting practices of victims will probably remain erratic, and the recording of crimes by authorities will continue to be less than uniform.

However, the ultimately more important purpose of crime statistics is the measurement of the distribution of crime by a variety of social, demographic, and geographic characteristics. Fortunately, the major sources of crime data— crimes known to the police, victimization surveys, and self-report surveys—generate similar distributions and correlates of crime, pointing to convergence rather than discrepancy among the measures of the basic characteristics of crime and criminals. The problems associated with each of the data sources remain, but they diminish in significance because these imperfect measures produce similar perspectives of crime. As Gwynn Nettler concluded, ‘‘Fortunately, despite the repeatedly discovered fact that more crime is committed than is recorded, when crimes are ranked by the frequency of their occurrence, the ordering is very much the same no matter which measure is used’’ (p. 97).

Comparisons of data from the UCR and the NCVS program show that they produce similar patterns of crime (Hindelang and Maltz). There is substantial agreement between the two measures in the ordering of the relative frequencies of each of the index crimes. Comparisons of selfreports of delinquency with crimes known to the police show that each provides a complementary rather than a contradictory perspective on juvenile crime (Hindelang, Hirschi, and Weis, 1981; Belson). Self-reports do not generate results on the distribution and correlates of delinquency that are contrary to those generated by police statistics or, for that matter, by victimization surveys. The youngsters who are more likely to appear in official police and court record data— boys, nonwhites, low achievers, youths with friends in trouble, urban residents, and youths with family problems—are also more likely to self-report higher rates of involvement in crime.

This message should be of some comfort to a variety of people interested in crime and delinquency, from researchers and theorists to policymakers, planners, program implementers, and evaluators. The basic facts of crime are more consistent than many scholars and authorities in the past would lead one to believe. In fact, the major sources of official and unofficial crime statistics are not typically inconsistent in their representations of the general features of crime but rather provide a convergent perspective on crime. The characteristics, distribution, and correlates of crime and, therefore, the implications for theory, policy, and programs are not discrepant by crime measure, but convergent. The data generated by a variety of measures are compatible and confirming sources of information on crime. The study and control of crime can best be informed by these complementary sources of crime statistics.

Bibliography:

BELSON, WILLIAM Juvenile Theft: The Causal Factors—A Report of an Investigation of the Tenability of Various Causal Hypotheses about the Development of Stealing by London Boys. New York: Harper & Row, 1975.
BIDERMAN, ALBERT ‘‘Surveys of Population Samples for Estimating Crime Incidence.’’ Annals of the American Academy of Political and Social Science 374 (1967): 16–33.
BLACK, DONALD ‘‘Production of Crime Rates.’’ American Sociological Review 35 (1970): 733– 748.
BOGGS, SARAH ‘‘Urban Crime Patterns.’’ American Sociological Review 30 (1965): 899–908.
COHEN, LAWRENCE, and FELSON, MARCUS. ‘‘Social Change and Crime Rate Trends: A Routine Activity Approach.’’ American Sociological Review 44 (1979): 588–608.
DOLESCHAL, EUGENE. ‘‘Sources of Basic Criminal Justice Statistics: A Brief Annotated Guide with Commentaries.’’ Criminal Justice Abstracts I 1 (1979): 122–147.
ENNIS, PHILIP Criminal Victimization in the United States: A Report of a National Survey. Chicago: University of Chicago, National Opinion Research Center, 1967.
Federal Bureau of Investigation. Crime in the United States. Uniform Crime Reports for the United States. Washington, D.C.: U.S. Department of Justice, F.B.I., annually.
FIENBERG, STEPHEN ‘‘Victimization and the National Crime Survey: Problems of Design and Analysis.’’ Indicators of Crime and Criminal Justice. Quantitative Studies. Edited by Stephen E. Fienberg and Albert J. Reiss, Jr. Washington, D.C.: U.S. Department of Justice, Bureau of Justice Statistics, 1980. Pages 33–40.
GUERRY, ANDRE MICHEL. Essai sur la slatistique morale de la France, precede dun Rapport a L Academie des Sciences, par MM. Lacroix, Silvestre, et Girard. Paris: Crochard, 1833.
HINDELANG, MICHAEL; GOTTFREDSON, MICHAEL R.; and GAROFALO, JAMES. Victims of Personal Crime: An Empirical Foundation for a Theory of Personal Victimization. Cambridge, Mass.: Ballinger, 1978.
HINDELANG, MICHAEL ‘‘The Uniform Crime Reports Revisited.’’ Journal of Criminal Justice 2 (1974): 1–17.
HINDELANG, MICHAEL; HIRSCHI, TRAVIS; and WEIS, JOSEPH G. ‘‘Correlates of Delinquency: The Illusion of Discrepancy between SelfReport and Official Measures.’’ American Sociological Review 44 (1979): 995–1014.
HINDELANG, MICHAEL; HIRSCHI, TRAVIS; and WEIS, JOSEPH G. Measuring Delinquency. Beverly Hills, Calif.: Sage, 1981.
HIRSCHI, TRAVIS. Causes of Delinquency. Berkeley: University of California Press, 1969.
KEPPEL, ROBERT, and WEIS, JOSEPH ‘‘Improving the Investigation of Violent Crime: The Homicide Investigation and Tracking System.’’ Washington, D.C.: U.S. Department of Justice, 1993.
KLINGER, DAVID, and BRIDGES, GEORGE. ‘‘Measurement Error in Calls-For-Service as an Indicator of Crime.’’ Criminology 35 (1997): 705– 726.
KULIK, JAMES; STEIN, KENNETH B.; and SARBIN, THEODORE R. ‘‘Disclosure of Delinquent Behavior under Conditions of Anonymity and Nonanonymity.’’ Journal of Consulting and Clinical Psychology 32 (1968): 506–509.
MALTZ, MICHAEL ‘‘Crime Statistics: A Mathematical Perspective.’’ Journal of Criminal Justice 3 (1975): 177–193.
MURPHY, FRED; SHIRLEY, MARY M.; and WITMER, HELEN L. ‘‘The Incidence of Hidden Delinquency.’’ American Journal of Orthopsychiatry 16 (1946): 686–696.
NETTLER, GWYNN. Explaining Crime, 2d ed. New York: McGraw-Hill, 1978.
NYE, F. IVAN. Family Relationships and Delinquent Behavior. New York: Wiley, 1958.
PENICK, BETTYE EIDSON, and OWENS, MAURICE E. B. III, eds. Surveying Crime. Washington, D.C.: National Academy of Sciences, National Research Council, Panel for the Evaluation of Crime Surveys, 1976.
PORTERFIELD, AUSTIN Youth in Trouble: Studies in Delinquency and Despair, with Plans for Prevention. Fort Worth, Tex.: Leo Potishman Foundation, 1946.
President’s Commission on Law Enforcement and Administration of Justice. The Challenge of Crime in a Free Society. Washington, D.C.: The Commission, 1967.
QUETELET, ADOLPHE. Recherches sur le penchant au crime aux differens ages, 2d ed. Brussels: Hayez, 1833.
RAE, RICHARD ‘‘Crime Statistics, Science or Mythology.’’ Police Chief 42 (1975): 72–73.
REAVES, BRIAN ‘‘Using NIBRS Data to Analyze Violent Crime.’’ Bureau of Justice Statistics Technical Report. Washington, D.C.: U.S. Department of Justice, 1993.
REISS, ALBERT, JR. Studies in Crime and Law Enforcement in Major Metropolitan Areas. Field Survey III, vol. 1. President’s Commission on Law Enforcement and Administration of Justice. Washington, D.C.: The Commission, 1967.
ROBERTS, DAVID. ‘‘Implementing the National Incident-Based Reporting System: A Project Status Report: A Joint Project of the Bureau of Justice Statistics and the Federal Bureau of Investigation [SEARCH, the National Consortium for Justice Information and Statistics].’’ Washington, D.C.: U.S. Department of Justice, 1997.
ROBINSON, LOUIS History and Organization of Criminal Statistics in the United States (1911). Reprint. Montclair, N.J.: Patterson Smith, 1969.
SELLIN, THORSTEN. ‘‘The Basis of a Crime Index.’’ Journal of the American Institute of Criminal Law and Criminology 22 (1931): 335–356.
SELLIN, THORSTEN, and WOLFGANG, MARVIN The Measurement of Delinquency. New York: Wiley, 1964.
SHORT, JAMES, JR., and NYE, F. IVAN. ‘‘Reported Behavior as a Criterion of Deviant Behavior.’’ Social Problems 5 (1957–1958): 207–213.
SKOGAN, WESLEY ‘‘Dimensions of the Dark Figure of Unreported Grime.’’ Crime and Delinquency 23 (1977): 41–50.
SPARKS, RICHARD ‘‘Criminal Opportunities and Crime Rates.’’ Indicators of Crime and Criminal Justice: Quantitative Studies. Edited by Stephen E. Fienberg and Albert J. Reiss, Jr. Washington, D.C.: U.S. Department of Justice, Bureau of Justice Statistics, 1980. Pages 18–32.
TURNER, ANTHONY. San Jose Methods Test of Known Crime Victims. Washington, D.C.: U.S. Department of Justice, Law Enforcement Assistance Administration, National Institute of Law Enforcement and Criminal Justice, 1972.
S. Bureau of the Census. Criminal Victimization Surveys in the Nation’s Five Largest Cities: National Crime Panel Surveys of Chicago, Detroit, Los Angeles, New York, and Philadelphia. Washington, D.C.: U.S. Department of Justice, Law Enforcement Assistance Administration, National Criminal Justice Information and Statistics Service, 1975.
S. Department of Justice, Law Enforcement Assistance Administration, National Criminal Justice Information and Statistics Service. Criminal Victimization Surveys in Eight American Cities: A Comparison of 1971/1972 and 1974/ 1975 Findings. Washington, D.C.: NCJISS, 1976.
S. National Commission on Law Observance and Enforcement [Wickersham Commission]. Report on Criminal Statistics. Washington, D.C.: The Commission, 1931.
VOLLMER, AUGUST. ‘‘The Bureau of Criminal Records.’’ Journal of the American Institute of Criminal Law and Criminology 11 (1920): 171–180.
WALLERSTEIN, JAMES, and WYLE, CLEMENT J. ‘‘Our Law-Abiding Law-Breakers.’’ Probation 25 (1947): 107–112.
WARNER, BARBARA, and PIERCE, GLENN L. ‘‘Reexamining Social Disorganization Theory Using Calls to the Police as a Measure of Crime.’’ Criminology 31 (1993): 493–517.
WILKINS, LESLIE Social Deviance: Social Policy, Action, and Research. Englewood Cliffs, N.J.: Prentice-Hall, 1965.

ORDER HIGH QUALITY CUSTOM PAPER

Immigration's Effect on US Wages and Employment Redux

In this article we revive, extend and improve the approach used in a series of influential papers written in the 2000s to estimate how changes in the supply of immigrant workers affected natives' wages in the US. We begin by extending the analysis to include the more recent years 2000-2022. Additionally, we introduce three important improvements. First, we introduce an IV that uses a new skill-based shift-share for immigrants and the demographic evolution for natives, which we show passes validity tests and has reasonably strong power. Second, we provide estimates of the impact of immigration on the employment-population ratio of natives to test for crowding out at the national level. Third, we analyze occupational upgrading of natives in response to immigrants. Using these estimates, we calculate that immigration, thanks to native-immigrant complementarity and college skill content of immigrants, had a positive and significant effect between +1.7 to +2.6\% on wages of less educated native workers, over the period 2000-2019 and no significant wage effect on college educated natives. We also calculate a positive employment rate effect for most native workers. Even simulations for the most recent 2019-2022 period suggest small positive effects on wages of non-college natives and no significant crowding out effects on employment.

We are grateful for Rebecca Brough for her research assistance and suggestions. The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research.

MARC RIS BibTeΧ

Download Citation Data

Mentioned in the News

What marijuana reclassification means for the United States

The U.S. Drug Enforcement Administration will move to reclassify marijuana as a less dangerous drug, a historic shift to generations of American drug policy that could have wide ripple effects across the country.

FILE - Marijuana plants are seen at a secured growing facility in Washington County, N.Y., May 12, 2023. The U.S. Drug Enforcement Administration will move to reclassify marijuana as a less dangerous drug, a historic shift to generations of American drug policy that could have wide ripple effects across the country. (AP Photo/Hans Pennink, File)

Copy Link copied

Budtender Rey Cruz weighs cannabis for a customer at the Marijuana Paradise on Friday, April 19, 2024, in Portland, Ore. (AP Photo/Jenny Kane)

Cloud 9 Cannabis employee Beau McQueen, right, helps a customer, Saturday, April 13, 2024, in Arlington, Wash. The shop is one of the first dispensaries to open under the Washington Liquor and Cannabis Board’s social equity program, established in efforts to remedy some of the disproportionate effects marijuana prohibition had on communities of color. (AP Photo/Lindsey Wasson)

WASHINGTON (AP) — The U.S. Drug Enforcement Administration is moving toward reclassifying marijuana as a less dangerous drug. The Justice Department proposal would recognize the medical uses of cannabis , but wouldn’t legalize it for recreational use.

The proposal would move marijuana from the “Schedule I” group to the less tightly regulated “Schedule III.”

So what does that mean, and what are the implications?

WHAT HAS ACTUALLY CHANGED? WHAT HAPPENS NEXT?

Technically, nothing yet. The proposal must be reviewed by the White House Office of Management and Budget, and then undergo a public-comment period and review from an administrative judge, a potentially lengthy process.

FILE - A marijuana plant is visible at a medical marijuana dispensary in Egg Harbor Township, N.J., March 22, 2019 The Biden administration's move to reclassify marijuana as a less dangerous but still illegal drug was hailed as a monumental step in reshaping national policy. But it appears it would do little to ease a longstanding problem in the industry, a lack of loans and banking services other businesses take for granted. (AP Photo/Julio Cortez, File)

Still, the switch is considered “paradigm-shifting, and it’s very exciting,” Vince Sliwoski, a Portland, Oregon-based cannabis and psychedelics attorney who runs well-known legal blogs on those topics, told The Associated Press when the federal Health and Human Services Department recommended the change.

“I can’t emphasize enough how big of news it is,” he said.

It came after President Joe Biden asked both HHS and the attorney general, who oversees the DEA, last year to review how marijuana was classified. Schedule I put it on par, legally, with heroin, LSD, quaaludes and ecstasy, among others.

Biden, a Democrat, supports legalizing medical marijuana for use “where appropriate, consistent with medical and scientific evidence,” White House press secretary Karine Jean-Pierre said Thursday. “That is why it is important for this independent review to go through.”

Cloud 9 Cannabis employee Beau McQueen, right, helps a customer, Saturday, April 13, 2024, in Arlington, Wash. The shop is one of the first dispensaries to open under the Washington Liquor and Cannabis Board's social equity program, established in efforts to remedy some of the disproportionate effects marijuana prohibition had on communities of color. (AP Photo/Lindsey Wasson)

Cloud 9 Cannabis employee Beau McQueen, right, helps a customer, Saturday, April 13, 2024, in Arlington, Wash. (AP Photo/Lindsey Wasson)

IF MARIJUANA GETS RECLASSIFIED, WOULD IT LEGALIZE RECREATIONAL CANNABIS NATIONWIDE?

Ap audio: what marijuana reclassification means for the united states.

AP correspondent Haya Panjwani reports on a proposal for the federal government to reclassify marijuana in what would be a historic shift that could have wide ripple effects across the country.

No. Schedule III drugs — which include ketamine, anabolic steroids and some acetaminophen-codeine combinations — are still controlled substances.

They’re subject to various rules that allow for some medical uses, and for federal criminal prosecution of anyone who traffics in the drugs without permission.

No changes are expected to the medical marijuana programs now licensed in 38 states or the legal recreational cannabis markets in 23 states, but it’s unlikely they would meet the federal production, record-keeping, prescribing and other requirements for Schedule III drugs.

There haven’t been many federal prosecutions for simply possessing marijuana in recent years, even under marijuana’s current Schedule I status, but the reclassification wouldn’t have an immediate impact on people already in the criminal justice system.

“Put simple, this move from Schedule I to Schedule III is not getting people out of jail,” said David Culver, senior vice president of public affairs at the U.S. Cannabis Council.

But rescheduling in itself would have some impact, particularly on research and marijuana business taxes.

WHAT WOULD THIS MEAN FOR RESEARCH?

Because marijuana is on Schedule I, it’s been very difficult to conduct authorized clinical studies that involve administering the drug. That has created something of a Catch-22: calls for more research, but barriers to doing it. (Scientists sometimes rely instead on people’s own reports of their marijuana use.)

Marijuana plants are seen at a secured growing facility in Washington County, N.Y., May 12, 2023. (AP Photo/Hans Pennink, File)

Schedule III drugs are easier to study, though the reclassification wouldn’t immediately reverse all barriers to study.

“It’s going to be really confusing for a long time,” said Ziva Cooper, director of the University of California, Los Angeles Center for Cannabis and Cannabinoids. “When the dust has settled, I don’t know how many years from now, research will be easier.”

Among the unknowns: whether researchers will be able to study marijuana from state-licensed dispensaries and how the federal Food and Drug Administration might oversee that.

Some researchers are optimistic.

“Reducing the schedule to schedule 3 will open up the door for us to be able to conduct research with human subjects with cannabis,” said Susan Ferguson, director of University of Washington’s Addictions, Drug & Alcohol Institute in Seattle.

WHAT ABOUT TAXES (AND BANKING)?

Under the federal tax code, businesses involved in “trafficking” in marijuana or any other Schedule I or II drug can’t deduct rent, payroll or various other expenses that other businesses can write off. (Yes, at least some cannabis businesses, particularly state-licensed ones, do pay taxes to the federal government, despite its prohibition on marijuana.) Industry groups say the tax rate often ends up at 70% or more.

The deduction rule doesn’t apply to Schedule III drugs, so the proposed change would cut cannabis companies’ taxes substantially.

They say it would treat them like other industries and help them compete against illegal competitors that are frustrating licensees and officials in places such as New York .

“You’re going to make these state-legal programs stronger,” says Adam Goers, of The Cannabist Company, formerly Columbia Care. He co-chairs a coalition of corporate and other players that’s pushing for rescheduling.

It could also mean more cannabis promotion and advertising if those costs could be deducted, according to Beau Kilmer, co-director of the RAND Drug Policy Center.

Rescheduling wouldn’t directly affect another marijuana business problem: difficulty accessing banks, particularly for loans, because the federally regulated institutions are wary of the drug’s legal status. The industry has been looking instead to a measure called the SAFE Banking Act . It has repeatedly passed the House but stalled in the Senate.

ARE THERE CRITICS? WHAT DO THEY SAY?

Indeed, there are, including the national anti-legalization group Smart Approaches to Marijuana. President Kevin Sabet, a former Obama administration drug policy official, said the HHS recommendation “flies in the face of science, reeks of politics” and gives a regrettable nod to an industry “desperately looking for legitimacy.”

Some legalization advocates say rescheduling weed is too incremental. They want to keep the focus on removing it completely from the controlled substances list, which doesn’t include such items as alcohol or tobacco (they’re regulated, but that’s not the same).

Paul Armentano, the deputy director of the National Organization for the Reform of Marijuana Laws, said that simply reclassifying marijuana would be “perpetuating the existing divide between state and federal marijuana policies.” Kaliko Castille, a past president of the Minority Cannabis Business Association, said rescheduling just “re-brands prohibition,” rather than giving an all-clear to state licensees and putting a definitive close to decades of arrests that disproportionately pulled in people of color.

“Schedule III is going to leave it in this kind of amorphous, mucky middle where people are not going to understand the danger of it still being federally illegal,” he said.

This story has been corrected to show that Kaliko Castille is a past president, not president, of the Minority Cannabis Business Association and that Columbia Care is now The Cannabist Company.

___ Peltz reported from New York. Associated Press writers Colleen Long in Washington and Carla K. Johnson in Seattle contributed to this report.

Russian Suspected Cybercrime Kingpin Pleads Guilty in US, TASS Reports

FILE PHOTO: Alexander Vinnik, a 38 year old Russian man suspected of running a money laundering operation using bitcoin, is escorted by police officers to a court in Athens, Greece December 13, 2017. REUTERS/Costas Baltas/File Photo

MOSCOW (Reuters) - Alexander Vinnik, a Russian suspected cybercrime kingpin who was arrested in Greece in 2017, convicted of money laundering in France three years later and is now awaiting trial in California, has pleaded partially guilty, TASS news agency cited his lawyer as saying on Saturday.

The lawyer, Arkady Bukh, said that as a result of the plea bargain he now expected Vinnik to get a prison term of less than 10 years.

"He pleaded guilty on a restricted number of charges," TASS quoted Bukh as saying, adding that Vinnik had faced life imprisonment.

"The culmination of the negotiations was a deal with the prosecutor's office. We expect that the prison term will be up to 10 years."

Vinnik, accused of laundering more than $4 billion through the digital currency bitcoin, was arrested in 2017 in Greece at the request of the United States, although Moscow has repeatedly demanded he be returned to Russia.

He was extradited to France from Greece where he was sentenced to five years in prison for money laundering before he was sent back to Greece and then on to the United States in 2022.

Photos You Should See - April 2024

TOPSHOT - People watch the April's full moonset, also known as the "Pink Moon", rising behind the clouds in Singapore on April 24, 2024. (Photo by Roslan RAHMAN / AFP) (Photo by ROSLAN RAHMAN/AFP via Getty Images)

The U.S. Department of Justice has said Vinnik "allegedly owned, operated, and administrated BTC-e, a significant cybercrime and online money laundering entity that allowed its users to trade in bitcoin with high levels of anonymity and developed a customer base heavily reliant on criminal activity."

The maximum penalty for the U.S. charges against Vinnik is 55 years in prison, according to the U.S. Department of Justice website.

(Reporting by Vladimir Soldatkin; Editing by Frances Kerry)

Join the Conversation

Tags: Greece , Russia , United States , France , crime , Europe , California

America 2024

Health News Bulletin

Stay informed on the latest news on health and COVID-19 from the editors at U.S. News & World Report.

Sign up to receive the latest updates from U.S News & World Report and our trusted partners and sponsors. By clicking submit, you are agreeing to our Terms and Conditions & Privacy Policy .

Cartoons on President Donald Trump

Feb. 1, 2017, at 1:24 p.m.

Photos: Obama Behind the Scenes

April 8, 2022

Photos: Who Supports Joe Biden?

March 11, 2020

Women, Money and Michael Cohen

Lauren Camera May 3, 2024

Lawmakers Ramp Up Response to Unrest

Aneeta Mathur-Ashton May 3, 2024

Job Growth Slows as Unemployment Rises

Tim Smart May 3, 2024

Did Hush Money Fuel Trump’s 2016 Win?

Lauren Camera May 2, 2024

Biden Condemns Unrest on Campuses

Aneeta Mathur-Ashton May 2, 2024

Four More Gag Order Violations

COMMENTS

(PDF) Impact of Crime Reporting System to Enhance ...
Received Date: 02 April 2021. Revised Date: 07 May 2021. Accepted Date: 09 May 2021. Ab stract - Crime reporting is a service that police provide. By the time number of reported crime cases get ...
Journal of Research in Crime and Delinquency: Sage Journals
The Journal of Research in Crime and Delinquency (JRCD), peer-reviewed and published bi-monthly, offers empirical articles and special issues to keep you up to date on contemporary issues and controversies in the study of crime and criminal-legal system responses.For more than sixty years, the journal has published work engaging a range of theoretical perspectives and methodological approaches ...
The accuracy of crime statistics: assessing the impact of ...
Objectives Police-recorded crimes are used by police forces to document community differences in crime and design spatially targeted strategies. Nevertheless, crimes known to police are affected by selection biases driven by underreporting. This paper presents a simulation study to analyze if crime statistics aggregated at small spatial scales are affected by larger bias than maps produced for ...
Development of an Online Crime Management & Reporting System
The central aim of such online crime reporting systems is to increase the rate of reported crimes by providing benefits to users (e. g., 24/7 accessibility). At the same time, PDS in policing ...
Crime & Delinquency: Sage Journals
Crime & Delinquency (CAD), peer-reviewed and published bi-monthly, is a policy-oriented journal offering a wide range of research and analysis for the scholar and professional in criminology and criminal justice. CAD focuses on issues and concerns that impact the criminal justice system, including the social, political and economic contexts of criminal justice, as well as the victims ...
A Cloud-Based Crime Reporting System with Identity Protection
The reporting server will then convert the user's password PWx into PWHASH with SHA-256: PWHASH = H ( PWx ) (1) Finally, the registration information IDx, encrypted PWHASH, IDNO and public key PUKUx of the user Ux are stored in the database, completing the registration process. 2.2.2. Login Verification Phase.
Crime Reporting in Chicago: A Comparison of Police and Victim Survey
Levels of crime reporting and the reasons victims provide for or against reporting crime to the police are examined. Patterns are compared to those found for other large U.S. cities. ... A Resource Available through the National Consortium on Violence Research." NCOVR Working Paper Series. Pittsburgh, PA: Carnegie Mellon University.
A systematic review on spatial crime forecasting
The papers identified as review or related-work studies (a total of 13) date back to 2003 and are connected to the keyword strategy that we used (find further details in "Study selection" section). In addition, to review papers (a total of 9), we also include two editorials, one book chapter, and one research paper, because they contain an extensive literature review in the field of crime ...
The Future of Crime Data
Use of a national incident-based collection of crimes known to the police provides (a) a set of descriptive indicators of crime in the United States that are currently lacking, (b) benchmarking for progress and change, and (c) more purposeful comparisons across place and time. The shift also serves to professionalize the policing industry ...
Uniform Crime Reports
Data from the Uniform Crime Reporting (UCR) Program are useful for studying multiple-year trends in offending and for analyzing variations in crime across geographical areas. The UCR tallies are accordingly a resource for law enforcement officials, policymakers, and academic researchers, as well as for the news media and the public at large.
National Incident-Based Reporting System (NIBRS)
Administrative records can be a useful data resource for criminologists and other researchers interested in crime. The National Incident-Based Reporting System (NIBRS) is the new format used by the Federal Bureau of Investigation (FBI) to collect crime data from law enforcement agencies as part of the Uniform Crime Reporting (UCR).
Artificial intelligence & crime prediction: A systematic literature
According to Table 3, analyzing the crime density and neighborhood was the most used approach since 52 research papers utilized this approach, making up 43% of papers. The spatial analysis aspect was performed by 19 research papers, or 16% of all papers. A total of 9 research papers, or 7%, applied human behavior analysis for crime prediction.
The Study of Crime Rates
The most widely cited source of crime data comes from the Federal Bureau of Investigation's Uniform Crime Reporting program. Since the 1940s, self-report surveys have proliferated as a means of understanding the nature and extent of criminal offending.
(PDF) Crime Prediction and Analysis
Crime Prediction and Analysis. February 2020. DOI: 10.1109/IDEA49133.2020.9170731. Conference: 2020 2nd International Conference on Data, Engineering and Applications (IDEA) Authors: Pratibha ...
(PDF) Development of Crime Reporting System to Identify Patterns of
Abstract. Purpose-This study developed an online crime reporting system that uses artificial intelligence to analyze crime incident reports to provide up-to-date crime statistics, map crime hot ...
Online Crime Reporting System—A Model
1. This model provides detailed information on crime to both the victim and the police. 2. This system has a database which helps in retrieving information by the crime branch whenever it is necessary. 3. It makes the whole process efficient. 4. This process is based on digitalization. 5.
Crime Reports and Statistics
I. Introduction. The purpose of this research paper is to provide an overview of crime reports and statistics. Crime reports and statistics convey an extensive assortment of information about crime to the reader and include topics such as the extent of crime and the nature or characteristics of criminal offenses, as well as how the nature and ...
Crime and justice research: The current landscape and future
Early in 2018, I was invited by the Economic and Social Research Council (ESRC) to prepare a concise (12 page) paper - a 'think piece' - on the scope for future Research Council investments in research on crime and justice. 1 This was one of 13 such invitations. These were issued to scholars working in fields that for various reasons (in some cases, perhaps, their comparative newness ...
Using Research to Improve Hate Crime Reporting and Identification
A recent series of evidence-based research initiatives supported by the National Institute of Justice (NIJ) is helping to narrow this critical knowledge gap and illuminate a better path forward. The study findings fill in vital details on causes of hate crime underreporting in various communities, including.
Crime forecasting: a machine learning and computer ...
A crime is a deliberate act that can cause physical or psychological harm, as well as property damage or loss, and can lead to punishment by a state or other authority according to the severity of the crime. The number and forms of criminal activities are increasing at an alarming rate, forcing agencies to develop efficient methods to take preventive measures. In the current scenario of ...
Design and Implementation of An Online Crime Reporting System
Internet-based crime reporting systems allow victims and witnesses of crime to report incidents to police 24/7 from any location. The aim of this project is to develop an online crime report and managing system which is easily accessible to the public. The police department and the administrative department.
PDF Online Crime Reporting
The purpose of this paper is to design a crime management and reporting system to be aimed at:- 1) To address fast crime reporting, person's emergency case can be send. 2) To develop a system which help government to solve the high crime rate and accidents. 3) To ensure the system will be simple to use and user friendly.
Crime Reporting Systems and Methods Research Paper
Crime Reporting Systems and Methods Research Paper. View sample criminology research paper on crime reporting systems and methods. Browse other research paper examples for more inspiration. If you need a thorough research paper written according to all the academic standards, you can always turn to our experienced writers for help.
Immigration's Effect on US Wages and Employment Redux
Immigration's Effect on US Wages and Employment Redux. In this article we revive, extend and improve the approach used in a series of influential papers written in the 2000s to estimate how changes in the supply of immigrant workers affected natives' wages in the US. We begin by extending the analysis to include the more recent years 2000-2022.
What marijuana reclassification means for the U.S.
WHAT WOULD THIS MEAN FOR RESEARCH? Because marijuana is on Schedule I, it's been very difficult to conduct authorized clinical studies that involve administering the drug. That has created something of a Catch-22: calls for more research, but barriers to doing it. (Scientists sometimes rely instead on people's own reports of their marijuana ...
Charlotte, NC shooting: 4 law enforcement officers killed as US ...
A 2016 report, summarized in an FBI document released to law enforcement agencies in May 2017, examined 50 shootings of police officers and found that the assailants' two key motives were a ...
Russian Suspected Cybercrime Kingpin Pleads Guilty in US, TASS Reports
US News is a recognized leader in college, grad school, hospital, mutual fund, and car rankings. Track elected officials, research health conditions, and find news you can use in politics ...
(PDF) Incidence of crimes and effectiveness of interventions in the
Many criminologists doubt that the dosage of uniformed police patrol causes any measurable difference in crime. This article reports a one-year randomized trial in Minneapolis of increases in ...
Missile Proliferation and Control in the Asia-Pacific Region
Asia-Pacific countries' ballistic- and cruise-missile proliferation pathways have largely mirrored those of European states, but their participation in regimes aimed at managing the spread of such technology has been patchy. This report surveys the ways in which states in the Asia-Pacific have acquired - and in some cases exported - missile technology, and the extent to which they have ...
Exclusive: Inside the AI research boom
China leads the U.S. as a top producer of research in more than half of AI's hottest fields, according to new data from Georgetown University's Center for Security and Emerging Technology (CSET) shared first with Axios.. Why it matters: The findings reveal important nuances about the global race between the U.S. and China to lead AI advances and set crucial standards for the technology and how ...