data and technology

Part 3: Data, Data Everywhere – Advanced Strategies for Analyzing Mobile Data

Share this article

In Part 1 of our series on mobile devices, we discussed preserving and collecting mobile device data. In Part 2, we turned to the types of information you can expect to encounter with mobile devices and key considerations for analyzing, reviewing, and producing these types of data. In Part 3, we examine advanced strategies for analyzing device data and how you can apply those strategies to your cases.

Understanding and organizing the available data

An initial step toward better understanding mobile device data is to organize that content, conceptually, into two groups: communications data and non-communications data. The chart below displays different types of information available from mobile devices organized into those two groups:

Communications Data Non-Communications Data
Call Logs Calendar
Chats Configurations
Email Contacts
Instant Messages Databases
MMS Messages Installed Applications
SMS Messages Journeys
Voicemails Locations
Notes
Passwords
Searched Items
User Accounts
Web History
Wireless Networks

Communications data encompass all communications available from or through the device. It includes emails from whatever email accounts are on the phone, business and personal. It also includes messaging from social media and chat applications, such as Skype, Facebook, and LinkedIn, running the full spectrum from purely personal to professional only.

From an analytical perspective, communications data matters because it can help you figure out who was communicating with whom, when, and about what. The communications content is key to efforts to find out what happened leading up to a lawsuit or investigation and to building up and tearing down the narratives that help drive investigations and lawsuits to resolution. Metadata from communications can be used to help create timelines, map webs of communications between actors you care about, and identify gaps in communications. Text from communications can be used both to support or refute hypotheses you have constructed and to help you find important information you had not imagined might exist.

If you limit your analysis to specific categories of content, you may find that a related data set also needs to be included to get a more complete picture of what happened. For example, “Instant Messages” data often is not included in the “Chats” data; if you only consider the latter, you could be ignoring key information. You may also find indications of data sources outside of the mobile device that you may want to collect, such as cloud-based accounts that are identified in the “User Accounts” data. Technology professionals can help you take a deeper dive into mobile data that may reveal more about the user’s activity.

Non-communications data consists of every form of data other than communications data. The content can include passwords needed to access content on the device or elsewhere; various types of notes entered by the user; multiple different sources of location data, including locations at which the user took pictures; photos the user took or received; number of steps taken; and so on.

Non-communications data offers a wealth of information potentially available from mobile devices that can be used for fact development, such as location history, web browsing activity, and call logs. When considering advanced analysis of mobile devices, you should first have a high-level understanding of the types of available information and the methods mobile devices typically use to store data. Most mobile devices store information in a series of databases, usually in SQLite format. These databases store everything from chat communications to file locations to individual applications’ user data. Non-communication data is most likely found in one or more databases on the phone. Forensic tools, such as Cellebrite, perform the extraction and organization of the database information for reporting, but it is also possible to extract and review the individual SQLite databases. It is important to note that forensic tools do not export information from every database; this can be due to encryption, a propriety format, or an inability of the forensic tool to export that type of data.

Non-communications data can be analyzed in a wide range of ways. For a personal injury case where the plaintiff has alleged impaired mobility, for example, you might use data about number of steps per day to help refute the assertion that the plaintiff can no longer walk for more than five minutes at a stretch. Or in a food contamination matter you might compare information about calendar entries, locations, and wireless networks to demonstrate that inspectors were not actually inspecting sites when they claimed to be.

To better appreciate the scope of data available from mobile devices, the following is a list of the 35 categories of data shown in a Cellebrite spreadsheet report for an actual phone, a topic we covered in Part 2. For each category listed, we have shown the number of items in the spreadsheet. The total number of items for the first 34 categories is 88,981. The final category, “Timeline”, contains an additional 47,389 rows of information. For each category ask yourself, “How might analyzing this data, alone or in conjunction with other data, help me in my matter?”

Spreadsheet Tab Number of Items in Tab Number of Columns in Tab Description
Summary n/a n/a Basic information such as device type, report creation date and time, and name of examiner.
Device Information n/a n/a Information about the device, such as serial number, model number, and OS version; last activation time; and phone settings such as time zone, locale language, and whether cloud backup was enabled.
Archives 7 26 Information about archives, such as name, size, path, and modified date.
Audio 1,501 26 Information about voicemail and recordings.
Bluetooth Devices 1,087 9 Information about Bluetooth devices, such as device name and MAC (media access control) address.
Calendar 336 22 Information about calendar entries such as subject, dates, and attendees.
Call Log 6,789 16 Information about calls made and received, including phone numbers, dates and times, and duration.
Chats 2,661 50 Information about chat communications, including dates and times, participants, and the chats themselves.
Configurations 38,402 26 Information about configurations of the various applications used by the mobile device.
Contacts 2,203 21 Information about contacts on the phone, such as name, phone numbers and email addresses, and sources of the contacts.
Cookies 5,824 16 Information about cookies on the device, including name (e.g., “GAPS”), domain (e.g., “accounts.google.com”), and related application (e.g., “Hangouts”).
Databases 586 29 Information about databases on the phone, including application (e.g., “Kindle”), path, and associated metadata.
Device Notifications 201 21 Information about notifications stored on the device, such as “New JetShuttle SOUTH FLORIDA(West Palm Beach)-NEW YORK(Teterboro) (25 MAY) available for you”.
Document 3 26 Information about documents stored on the device.
Emails 2,531 22 Information about email messages, such as from and to, date and time, and source.
Image Hashes 1 6
Images 4,756 26 Information about images on the device, including file name, path, and metadata.
Installed Applications 259 20 Information about applications installed on the device, such as name (e.g., “Audible”), identifier (e.g., “com.audible.iphone”), and purchase date.
Instant Messages 552 19 Information about instant messages, such as from and to, subject, body, and date and time.
Journeys 18 12 Information about trips taken, such as journey (e.g., “Uber trip” or “My Location”), start and end times, and from and to points (e.g., 5/2/2017 11:22:50 PM(UTC-4): (25.703148, -75.041062),”).
Locations 896 19 Information about stored location information, such as wi-fi connections, geo-tagged media files, geo-tagged calls and chats, and map application activity.
Log Entries 310 17 Information about use of the device, such as the amount of logged data usage by applications.
MMS Messages 445 37 Information about MMS (Multimedia Messaging Service) messages, such as from and to, date, and body of message.
Notes 35 16 Information about notes on the device, such as title, body, and dates and times.
Passwords 164 12 Information about passwords used with the device.
Searched Items 658 13 Information about searches performed using the device, such as source (e.g., “Safari”) and value (e.g., “what’s the difference between contemporary and modern”).
SMS Messages 12,050 19 Information about SMS (Short Message Service) messages, such as from and to, date, and body of message.
Text 35 26 Information about text and log files stored on the device.
User Accounts 21 15 Information about user accounts on the device, including user name (e.g., [email protected]) and entries (e.g., “-Account Description: Exchange”).
Videos 28 26 Information about video files on the device, including name, path, and meta data.
Voicemails 1,498 11 Information about voicemail messages, including from (phone number and possibly name), date and time, and duration.
Web Bookmarks 56 16 Information about webpages bookmarked.
Web History 5,466 11 Information about webpages visited.
Wireless Networks 52 17 Information about wireless networks connected to.
Timeline 47,389 23 Comprehensive information about device activity, including user communication activity.

A closer look. Each of the tabs contains a range of information about its contents. To illustrate this, we will take a look at the “Databases” tab (see screenshot). This tab lists the 586 SQLite databases on the device. Each database has up to 29 columns of information.

This database report is the first report you should review if you are interested in seeing what information is available on the mobile device. You may find that a new or obscure application was heavily used by the user, and this information may not be extracted by your forensic tool—resulting in potentially omitting key information.

If you find an important application that is listed in the database report, you may be able to review the database by opening it with an SQLite reader. Generally, it is advisable to partner with an experienced forensic expert for advanced analyses of mobile SQLite databases. It will be necessary to spend time understanding the structure of that application’s database before you can determine what it contains. You can learn more about SQLite here and download a free SQLite reader here.

As mentioned, the “Databases” tab contains 29 columns. The following table is the entry for one of the databases, showing the columns and their contents for that database. Note that for some of these columns we have provided explanations; for others we have not:

Column Title Contents Explanation
# 5 A sequential number assigned when the spreadsheet report is generated.
File System iPhone
Name AEAnnotation_v10312011_

1728_local.sqlite

Row count 126 The number of records in this database, which represents the amount of activity contained in the database.
Decoded by Whether Cellebrite decoded and is able to export the contents.
Application iBooks The database’s application, such as Facebook or Address Book.
Size (bytes) 102400
Path iPhone/Applications/com.apple.iBooks/

Documents/storeFiles/AEAnnotation_

v10312011_1728_local.sqlite

Encrypted Whether the database is encrypted. This is extremely important for identifying any data sources that may be impossible to extract because of encryption.
Meta Data iPhone Domain:AppDomain-com.

apple.iBooks

Encryption Key:030000002B5BCDAD

7156BBEA9D9847102D7F7604594AA

54DAB03EF75C5B347A8120AC9C8A

B7271F3D030E65F

iTunes Backup original file name:715cda36fa46cecd13b4fc1ba61c8

2817895224f

File size:102400 Bytes

Chunks:1

Date & Time

Creation time:12/25/2014 4:42:37 PM(UTC+0)

Modify time:5/3/2017 3:16:59 PM(UTC+0)

Last access time:

Deleted time:

Offsets

Data offset:0x0

Tags Database
MD5 b7c315510c4c398867c4f260413f44de
Hash sets
Category
SHA256
Modified-Date 5/3/2017
Modified-Time 5/3/2017 11:16:59 AM(UTC-4)
Created-Date 12/25/2014
Created-Time 12/25/2014 11:42:37 AM(UTC-5)
Accessed-Date
Accessed-Time
Deleted
Deleted-Date
Deleted-Time
Tag Note
Additional file info
Attachment source app
Carving False
Duplicates

Conducting advanced mobile device data analysis

The volume and variety of information available from even a single cellphone offer a wide array of ways to analyze the content from mobile devices. As discussed above, the mobile phone we looked at for this exercise contained 35 categories of data, with a tab in the mobile device spreadsheet for each category. The tabs contained a combined total of 136,370 rows of information, with between 1 to 47,389 rows per tab and with an average of 4,132 and a median of 2,781 rows per tab. The tabs contained a combined total of 671 columns of information, between 6 and 50 columns per tab and an average of 20 and a median of 19 columns per tab. In all, there were 638 different column names. Most columns names appeared only one, two, or three times. Others appeared regularly, such as “Name” (16 times); “Source” (12 times); and “Created-Date”, “Modified-Date”, “Modified-Date”, and “Modified-Time” (10 times each).

Type of Information Total in Spreadsheet Maximum Minimum Average Median
Category (one per tab) 35
Rows 136,370 47,389 per tab 1 per tab 4,132 per tab 2,781 per tab
Columns 671 50 per tab 6 per tab 20 per tab 19 per tab
Column Names 210 31 occurrences 1 occurrence 3 occurrences 1 occurrence

Given this volume and variety of data, what forms of analyses can be performed? You could try using simple key words searches or more advanced Boolean searches, but by themselves these approaches are not likely to deliver results of great interest.  You could attempt to perform some form of TAR, but even if you were able to figure out how to deploy a TAR tool against a group of spreadsheets the amount of data in the spreadsheet cells would be too spare for most if not all TAR tools do deliver any meaningful results.

Instead, this is a great time to don the best thinking cap you can find. Start with your objectives. Are you, for example, attempting to determine whether you can prove an affirmative defense? If so, what are the elements of that defense? And for each element, what do you need to prove? And to prove each element, what information do you need? And to find that information, what do you need to look for

Because this may be getting a bit too abstract, here are two specific examples of analyses that can be performed: contact resolution analysis, and geolocation analysis.

Contact Resolution Analysis. Contact resolution analysis matches the name of a contact in the communications data back to the real-world name or alias found in the address book. In other words, it is the matching of phone numbers and/or IDs to the names found in the address book. The following is an example of address book information that can be used:

The goal of the analysis is to identify “John Smith” every time one of his three phone numbers or email address appears within his communications. This process requires importing every communication record into a database, parsing each of the address book entries, and updating the To/From fields to include “John Smith” alongside the phone number or email address, because the communications may not have his name included in all communications. The challenge with this analysis is that address books are not always reliable. Some people make mistakes entering names and phone numbers in them, causing misidentified individuals. Also, office main phone numbers as opposed to direct phone numbers can cause issues, so careful analysis is required.

Once the matching is performed, you can mine the communications data for patterns and identify key individuals. Analyzing communication patterns can demonstrate relationships among parties and the frequency of communication.  Tools such as Brainspace, NexLP, and Tableau can visualize communication patterns and enhance your review of this information.  The following example shows a visualization of email communications of Enron data in NexLP:

Geolocation analysis. While analysis of a custodian’s location history has frequently been a tool for criminal cases, it also can be useful in civil matters and investigations. For cases such as intellectual property theft and internal investigations, knowing where someone was at a specific time is important. Mobile devices—for better or worse—might provide that information. Some phones have built-in applications that track a user’s geolocation, and some user-installed applications may collect geolocation data. Applications such as map and chat programs can track and store a user’s location, and some users opt to include geotagging in their photos and videos. All this information can be analyzed.

Most geolocation is reliable for analysis, but the location information may be off by up to several hundred yards if the user is in a location with a bad GPS signal or is traveling at high speeds. That means that cases that require pinpoint-accurate location information may not be able to fully rely on the data from the mobile device itself and may require that information be obtained from other sources.

Mobile device forensic reports typically store geolocation data in a single report. For Cellebrite, this information can be found in the “Locations” report. These reports include information about wireless network usage, where pictures were taken, maps (lookup and starting location), and even locations where text messages were sent. You can filter the type of events and build maps or timeline analysis to show where and when events took place. The following is an example of a simplified geo-mapping of the location information from a Cellebrite report.

Conclusion

Although often not even preserved in civil litigation, the content from mobile devices offers enormous opportunities to lawyers and their staff seeking to better understand their matters and hoping to be able to build, test, and present more effective narratives.

The basic reports most commonly generated about the contents of mobile devices offer a starting point, and sometimes a very good one, for those seeking insight from that data. To take better advantage of the opportunities that mobile device data presents, you should consider moving beyond the basics and taking the first steps toward performing more advanced analytics on that data.

George Socha on Email
George Socha
Senior Vice President of Brand Awareness at Reveal
George Socha is the Senior Vice President of Brand Awareness at Reveal, where he promotes brand awareness, helps guide development of product roadmap and consults with customers on effective deployment of legal technology.

Named an “E-Discovery Trailblazer” by The American Lawyer, George has assisted corporate, law firm, and government clients with all facets of electronic discovery, including information governance, domestically and globally. He served clients in a variety of industries including pharmaceutical, energy, retail, banking and technology, among others. As a renowned industry thought leader, he has authored more than 50 articles and spoken at more than 200 engagements across the world on a variety of e-discovery topics. His extensive knowledge has also been utilized more than 20 times to provide expert testimony.

Co-founder of the Electronic Discovery Reference Model (EDRM), a framework that outlines the standards for the recovery and discovery of digital data, and the Information Governance Reference Model (IGRM), a similar framework specific to information management, George is skilled at developing and implementing electronic discovery strategies and managing electronic discovery processes.
Martha Louks on Email
Martha Louks
Director of Technology Services at McDermott Will & Emery LLP
Martha Louks focuses on implementing high-value, efficiency-driven solutions that improve the delivery of legal services for clients. Martha collaborates with legal teams to develop technology and workflow approaches that align with case strategy, reduce costs and improve efficiency. Martha has extensive experience developing defensible processes that center on the tailored use of technology and experienced professionals to achieve results for our clients. She evaluates the efficacy of a wide range of technologies to determine which tools are best suited to a matter’s unique needs. Martha also oversees technology and provides consulting services for McDermott Discovery. She prepares discovery preservation plans capable of withstanding intense scrutiny while simultaneously addressing the flexibility necessary for clients to meet their business obligations. She advises on electronic evidence considerations, working closely with legal teams to incorporate the results of forensic investigation into legal analysis. Martha is also highly skilled in using artificial intelligence for Technology Assisted Review and investigations, consulting on the defensibility and effectiveness of varied workflow approaches. Martha is a Certified Relativity Expert, having three concurrent certifications: Relativity Certified Administrator, Relativity Analytics Specialist and Relativity Assisted Review Specialist.
Joe Sremack on Email
Joe Sremack
Director, Data Analytics & Software Robotics at BDO
Joe Sremack is a director in BDO’s Data Analytics & Software Robotics practice. His primary focus is developing and implementing strategies and technologies to assist corporate and legal clients in matters involving complex technology issues and investigations. Joe has deep knowledge of structured data collection and analysis, information technology (IT) assessments, electronic discovery, and software analysis. A computer scientist by training, Joe has conducted numerous investigations and assessments—including the three largest Ponzi schemes in history—involving systems investigations, data analysis, source code analysis, data compliance assessments and the evaluation of technology solutions. He has assisted clients across the U.S. and internationally in such matters as financial crime investigations, regulatory compliance assessments, intellectual property theft investigations and antitrust disputes. Joe has worked with clients across a wide range of data-intensive industries, including healthcare, finance, technology, and energy. He has also served clients in the hospitality, non-profit and telecommunications sectors. Joe frequently presents and writes on topics involving transactional data systems and is the author of Big Data Forensics, a technical guide on performing investigations of large-scale, clustered data systems. Prior to joining BDO, Joe held leadership positions at several expert service and advisory consulting firms.

Share this article