In Part 1 of our series on mobile devices, we discussed preserving and collecting mobile device data. In Part 2, we turned to the types of information you can expect to encounter with mobile devices and key considerations for analyzing, reviewing, and producing these types of data. In Part 3, we examine advanced strategies for analyzing device data and how you can apply those strategies to your cases.
Understanding and organizing the available data
An initial step toward better understanding mobile device data is to organize that content, conceptually, into two groups: communications data and non-communications data. The chart below displays different types of information available from mobile devices organized into those two groups:
Communications Data | Non-Communications Data |
Call Logs | Calendar |
Chats | Configurations |
Contacts | |
Instant Messages | Databases |
MMS Messages | Installed Applications |
SMS Messages | Journeys |
Voicemails | Locations |
Notes | |
Passwords | |
Searched Items | |
User Accounts | |
Web History | |
Wireless Networks |
Communications data encompass all communications available from or through the device. It includes emails from whatever email accounts are on the phone, business and personal. It also includes messaging from social media and chat applications, such as Skype, Facebook, and LinkedIn, running the full spectrum from purely personal to professional only.
From an analytical perspective, communications data matters because it can help you figure out who was communicating with whom, when, and about what. The communications content is key to efforts to find out what happened leading up to a lawsuit or investigation and to building up and tearing down the narratives that help drive investigations and lawsuits to resolution. Metadata from communications can be used to help create timelines, map webs of communications between actors you care about, and identify gaps in communications. Text from communications can be used both to support or refute hypotheses you have constructed and to help you find important information you had not imagined might exist.
If you limit your analysis to specific categories of content, you may find that a related data set also needs to be included to get a more complete picture of what happened. For example, “Instant Messages” data often is not included in the “Chats” data; if you only consider the latter, you could be ignoring key information. You may also find indications of data sources outside of the mobile device that you may want to collect, such as cloud-based accounts that are identified in the “User Accounts” data. Technology professionals can help you take a deeper dive into mobile data that may reveal more about the user’s activity.
Non-communications data consists of every form of data other than communications data. The content can include passwords needed to access content on the device or elsewhere; various types of notes entered by the user; multiple different sources of location data, including locations at which the user took pictures; photos the user took or received; number of steps taken; and so on.
Non-communications data offers a wealth of information potentially available from mobile devices that can be used for fact development, such as location history, web browsing activity, and call logs. When considering advanced analysis of mobile devices, you should first have a high-level understanding of the types of available information and the methods mobile devices typically use to store data. Most mobile devices store information in a series of databases, usually in SQLite format. These databases store everything from chat communications to file locations to individual applications’ user data. Non-communication data is most likely found in one or more databases on the phone. Forensic tools, such as Cellebrite, perform the extraction and organization of the database information for reporting, but it is also possible to extract and review the individual SQLite databases. It is important to note that forensic tools do not export information from every database; this can be due to encryption, a propriety format, or an inability of the forensic tool to export that type of data.
Non-communications data can be analyzed in a wide range of ways. For a personal injury case where the plaintiff has alleged impaired mobility, for example, you might use data about number of steps per day to help refute the assertion that the plaintiff can no longer walk for more than five minutes at a stretch. Or in a food contamination matter you might compare information about calendar entries, locations, and wireless networks to demonstrate that inspectors were not actually inspecting sites when they claimed to be.
To better appreciate the scope of data available from mobile devices, the following is a list of the 35 categories of data shown in a Cellebrite spreadsheet report for an actual phone, a topic we covered in Part 2. For each category listed, we have shown the number of items in the spreadsheet. The total number of items for the first 34 categories is 88,981. The final category, “Timeline”, contains an additional 47,389 rows of information. For each category ask yourself, “How might analyzing this data, alone or in conjunction with other data, help me in my matter?”
Spreadsheet Tab | Number of Items in Tab | Number of Columns in Tab | Description |
Summary | n/a | n/a | Basic information such as device type, report creation date and time, and name of examiner. |
Device Information | n/a | n/a | Information about the device, such as serial number, model number, and OS version; last activation time; and phone settings such as time zone, locale language, and whether cloud backup was enabled. |
Archives | 7 | 26 | Information about archives, such as name, size, path, and modified date. |
Audio | 1,501 | 26 | Information about voicemail and recordings. |
Bluetooth Devices | 1,087 | 9 | Information about Bluetooth devices, such as device name and MAC (media access control) address. |
Calendar | 336 | 22 | Information about calendar entries such as subject, dates, and attendees. |
Call Log | 6,789 | 16 | Information about calls made and received, including phone numbers, dates and times, and duration. |
Chats | 2,661 | 50 | Information about chat communications, including dates and times, participants, and the chats themselves. |
Configurations | 38,402 | 26 | Information about configurations of the various applications used by the mobile device. |
Contacts | 2,203 | 21 | Information about contacts on the phone, such as name, phone numbers and email addresses, and sources of the contacts. |
Cookies | 5,824 | 16 | Information about cookies on the device, including name (e.g., “GAPS”), domain (e.g., “accounts.google.com”), and related application (e.g., “Hangouts”). |
Databases | 586 | 29 | Information about databases on the phone, including application (e.g., “Kindle”), path, and associated metadata. |
Device Notifications | 201 | 21 | Information about notifications stored on the device, such as “New JetShuttle SOUTH FLORIDA(West Palm Beach)-NEW YORK(Teterboro) (25 MAY) available for you”. |
Document | 3 | 26 | Information about documents stored on the device. |
Emails | 2,531 | 22 | Information about email messages, such as from and to, date and time, and source. |
Image Hashes | 1 | 6 | |
Images | 4,756 | 26 | Information about images on the device, including file name, path, and metadata. |
Installed Applications | 259 | 20 | Information about applications installed on the device, such as name (e.g., “Audible”), identifier (e.g., “com.audible.iphone”), and purchase date. |
Instant Messages | 552 | 19 | Information about instant messages, such as from and to, subject, body, and date and time. |
Journeys | 18 | 12 | Information about trips taken, such as journey (e.g., “Uber trip” or “My Location”), start and end times, and from and to points (e.g., 5/2/2017 11:22:50 PM(UTC-4): (25.703148, -75.041062),”). |
Locations | 896 | 19 | Information about stored location information, such as wi-fi connections, geo-tagged media files, geo-tagged calls and chats, and map application activity. |
Log Entries | 310 | 17 | Information about use of the device, such as the amount of logged data usage by applications. |
MMS Messages | 445 | 37 | Information about MMS (Multimedia Messaging Service) messages, such as from and to, date, and body of message. |
Notes | 35 | 16 | Information about notes on the device, such as title, body, and dates and times. |
Passwords | 164 | 12 | Information about passwords used with the device. |
Searched Items | 658 | 13 | Information about searches performed using the device, such as source (e.g., “Safari”) and value (e.g., “what’s the difference between contemporary and modern”). |
SMS Messages | 12,050 | 19 | Information about SMS (Short Message Service) messages, such as from and to, date, and body of message. |
Text | 35 | 26 | Information about text and log files stored on the device. |
User Accounts | 21 | 15 | Information about user accounts on the device, including user name (e.g., [email protected]) and entries (e.g., “-Account Description: Exchange”). |
Videos | 28 | 26 | Information about video files on the device, including name, path, and meta data. |
Voicemails | 1,498 | 11 | Information about voicemail messages, including from (phone number and possibly name), date and time, and duration. |
Web Bookmarks | 56 | 16 | Information about webpages bookmarked. |
Web History | 5,466 | 11 | Information about webpages visited. |
Wireless Networks | 52 | 17 | Information about wireless networks connected to. |
Timeline | 47,389 | 23 | Comprehensive information about device activity, including user communication activity. |
A closer look. Each of the tabs contains a range of information about its contents. To illustrate this, we will take a look at the “Databases” tab (see screenshot). This tab lists the 586 SQLite databases on the device. Each database has up to 29 columns of information.
This database report is the first report you should review if you are interested in seeing what information is available on the mobile device. You may find that a new or obscure application was heavily used by the user, and this information may not be extracted by your forensic tool—resulting in potentially omitting key information.
If you find an important application that is listed in the database report, you may be able to review the database by opening it with an SQLite reader. Generally, it is advisable to partner with an experienced forensic expert for advanced analyses of mobile SQLite databases. It will be necessary to spend time understanding the structure of that application’s database before you can determine what it contains. You can learn more about SQLite here and download a free SQLite reader here.
As mentioned, the “Databases” tab contains 29 columns. The following table is the entry for one of the databases, showing the columns and their contents for that database. Note that for some of these columns we have provided explanations; for others we have not:
Column Title | Contents | Explanation |
# | 5 | A sequential number assigned when the spreadsheet report is generated. |
File System | iPhone | |
Name | AEAnnotation_v10312011_
1728_local.sqlite |
|
Row count | 126 | The number of records in this database, which represents the amount of activity contained in the database. |
Decoded by | Whether Cellebrite decoded and is able to export the contents. | |
Application | iBooks | The database’s application, such as Facebook or Address Book. |
Size (bytes) | 102400 | |
Path | iPhone/Applications/com.apple.iBooks/
Documents/storeFiles/AEAnnotation_ v10312011_1728_local.sqlite |
|
Encrypted | Whether the database is encrypted. This is extremely important for identifying any data sources that may be impossible to extract because of encryption. | |
Meta Data | iPhone Domain:AppDomain-com.
apple.iBooks Encryption Key:030000002B5BCDAD 7156BBEA9D9847102D7F7604594AA 54DAB03EF75C5B347A8120AC9C8A B7271F3D030E65F iTunes Backup original file name:715cda36fa46cecd13b4fc1ba61c8 2817895224f File size:102400 Bytes Chunks:1 Date & Time Creation time:12/25/2014 4:42:37 PM(UTC+0) Modify time:5/3/2017 3:16:59 PM(UTC+0) Last access time: Deleted time: Offsets Data offset:0x0 |
|
Tags | Database | |
MD5 | b7c315510c4c398867c4f260413f44de | |
Hash sets | ||
Category | ||
SHA256 | ||
Modified-Date | 5/3/2017 | |
Modified-Time | 5/3/2017 11:16:59 AM(UTC-4) | |
Created-Date | 12/25/2014 | |
Created-Time | 12/25/2014 11:42:37 AM(UTC-5) | |
Accessed-Date | ||
Accessed-Time | ||
Deleted | ||
Deleted-Date | ||
Deleted-Time | ||
Tag Note | ||
Additional file info | ||
Attachment source app | ||
Carving | False | |
Duplicates |
Conducting advanced mobile device data analysis
The volume and variety of information available from even a single cellphone offer a wide array of ways to analyze the content from mobile devices. As discussed above, the mobile phone we looked at for this exercise contained 35 categories of data, with a tab in the mobile device spreadsheet for each category. The tabs contained a combined total of 136,370 rows of information, with between 1 to 47,389 rows per tab and with an average of 4,132 and a median of 2,781 rows per tab. The tabs contained a combined total of 671 columns of information, between 6 and 50 columns per tab and an average of 20 and a median of 19 columns per tab. In all, there were 638 different column names. Most columns names appeared only one, two, or three times. Others appeared regularly, such as “Name” (16 times); “Source” (12 times); and “Created-Date”, “Modified-Date”, “Modified-Date”, and “Modified-Time” (10 times each).
Type of Information | Total in Spreadsheet | Maximum | Minimum | Average | Median |
Category (one per tab) | 35 | ||||
Rows | 136,370 | 47,389 per tab | 1 per tab | 4,132 per tab | 2,781 per tab |
Columns | 671 | 50 per tab | 6 per tab | 20 per tab | 19 per tab |
Column Names | 210 | 31 occurrences | 1 occurrence | 3 occurrences | 1 occurrence |
Given this volume and variety of data, what forms of analyses can be performed? You could try using simple key words searches or more advanced Boolean searches, but by themselves these approaches are not likely to deliver results of great interest. You could attempt to perform some form of TAR, but even if you were able to figure out how to deploy a TAR tool against a group of spreadsheets the amount of data in the spreadsheet cells would be too spare for most if not all TAR tools do deliver any meaningful results.
Instead, this is a great time to don the best thinking cap you can find. Start with your objectives. Are you, for example, attempting to determine whether you can prove an affirmative defense? If so, what are the elements of that defense? And for each element, what do you need to prove? And to prove each element, what information do you need? And to find that information, what do you need to look for
Because this may be getting a bit too abstract, here are two specific examples of analyses that can be performed: contact resolution analysis, and geolocation analysis.
Contact Resolution Analysis. Contact resolution analysis matches the name of a contact in the communications data back to the real-world name or alias found in the address book. In other words, it is the matching of phone numbers and/or IDs to the names found in the address book. The following is an example of address book information that can be used:
The goal of the analysis is to identify “John Smith” every time one of his three phone numbers or email address appears within his communications. This process requires importing every communication record into a database, parsing each of the address book entries, and updating the To/From fields to include “John Smith” alongside the phone number or email address, because the communications may not have his name included in all communications. The challenge with this analysis is that address books are not always reliable. Some people make mistakes entering names and phone numbers in them, causing misidentified individuals. Also, office main phone numbers as opposed to direct phone numbers can cause issues, so careful analysis is required.
Once the matching is performed, you can mine the communications data for patterns and identify key individuals. Analyzing communication patterns can demonstrate relationships among parties and the frequency of communication. Tools such as Brainspace, NexLP, and Tableau can visualize communication patterns and enhance your review of this information. The following example shows a visualization of email communications of Enron data in NexLP:
Geolocation analysis. While analysis of a custodian’s location history has frequently been a tool for criminal cases, it also can be useful in civil matters and investigations. For cases such as intellectual property theft and internal investigations, knowing where someone was at a specific time is important. Mobile devices—for better or worse—might provide that information. Some phones have built-in applications that track a user’s geolocation, and some user-installed applications may collect geolocation data. Applications such as map and chat programs can track and store a user’s location, and some users opt to include geotagging in their photos and videos. All this information can be analyzed.
Most geolocation is reliable for analysis, but the location information may be off by up to several hundred yards if the user is in a location with a bad GPS signal or is traveling at high speeds. That means that cases that require pinpoint-accurate location information may not be able to fully rely on the data from the mobile device itself and may require that information be obtained from other sources.
Mobile device forensic reports typically store geolocation data in a single report. For Cellebrite, this information can be found in the “Locations” report. These reports include information about wireless network usage, where pictures were taken, maps (lookup and starting location), and even locations where text messages were sent. You can filter the type of events and build maps or timeline analysis to show where and when events took place. The following is an example of a simplified geo-mapping of the location information from a Cellebrite report.
Conclusion
Although often not even preserved in civil litigation, the content from mobile devices offers enormous opportunities to lawyers and their staff seeking to better understand their matters and hoping to be able to build, test, and present more effective narratives.
The basic reports most commonly generated about the contents of mobile devices offer a starting point, and sometimes a very good one, for those seeking insight from that data. To take better advantage of the opportunities that mobile device data presents, you should consider moving beyond the basics and taking the first steps toward performing more advanced analytics on that data.