Graphic Databases – Neo4j (With Example)

Graphic Databases - Neo4j

Although relational databases are still the most popular type of database, several “alternative” so-called NoSQL databases have emerged in recent years. These are databases that are not based on the relation data model. NoSQL systems have emerged from the new demands for more flexibility and better performance for storing large amounts of data. This is mainly due to the popularity of the Internet and information technology and the increasing amount of data being generated. One of the popular types of NoSQL databases is graphical databases, which will be the topic of the continuation of this text.

What are graph databases?

Database graphs use the concept of the graph as a data model instead of a relational one. A graph is a data structure that consists of nodes and branches between them. The nodes themselves are used to store the data organized as a set of key/value pairs, while the branches represent connections or relationships between the data.

Neo4j – Why and when to use it?

Neo4j is one of the world’s leading open-source graph databases developed using Java technology. Neo4j enables:

Flexibility Scheme – No need for a fixed data model. Attributes can be added and removed as needed.

Scalability – allows you to increase the number of reads/reads as well as the size of the database only without affecting the speed of query execution.

Replication – enables data replication while providing maximum security and reliability.

ACID (Atomicity, Consistency, Isolation, Durability) transaction model
Built-in Web Application – You can create queries to create and search the database through a simple web interface.

Whiteboard Friendly – Neo4j is said to be whiteboard friendly because it allows you to make the base model simply by drawing knots and links.

Supports indexes

Some of the best times to use the Neo4j database:

Fraud Detection & Prevention – Big companies lose billions of dollars a year due to fraudsters using various sophisticated tricks and frauds such as, identity theft, impersonation, credit card fraud and money laundering. Neo4j helps detect them. Below, we will demonstrate one example of counteracting fraud.

Monitoring network infrastructure

Real-time referral system – You own an online store. With the help of the Neo4j database, you can easily recommend additional items to your customers based on what they are looking for.

Social Networking – Allows you to accelerate both the development and the application itself.

Access rights and identity control

By clicking here you can see an example of creating an email notification system using a Neo4j database.

Cypher – Inquiry language

Cypher is a declarative query language inspired by SQL. It allows us to specify what we want to select, add, edit, or delete from a graph without explicitly stating how it will be done.

MATCH (node1: Label1) – [: relation] -> (node2: Label2)

     WHERE node1.propertyA = {value}

     RETURN node2.propertyA, node2.propertyB


Creating Nodes

CREATE (you: Person {name: “Alen”}) RETURN you

This query creates a new node labeled you of the Person type and contains data with a key named ‘Alen’.

Creating relationships

MATCH (you: Person {name: “Alen”})
CREATE (you) – [like: LIKE] -> (neo: Database
{name: “Neo4j”})
RETURN you, like, neo

With the MATCH command, we find a data type of Person whose key value matches the set value “Alen” and assign it a tag you will use to refer to that data. The CREATE command creates a LIKE relation to another Database type data with data that has Neo4j as the key name.

MATCH (you: Person {name: “Alen”})
FOREACH (name and [“Alen”, “John”, “Anna”, “Luck”, “Peter”] |
CREATE (you) – [: FRIEND] -> (: Person {name: name}))

This command creates new Person type data with the names listed in the FOREACH section and creates a FRIEND connection for everyone.

Search for friends

MATCH (you {name: “Alen”}) – [: FRIEND] -> (yourFriends)
RETURN you, yourFriends

This query returns, as a result, of all data according to which there is a FRIEND connection from the data to which the default value “Alen” corresponds to the key.

Making friends of our friend

MATCH (neo: Database {name: “Neo4j”})
MATCH (ana: Person {name: “Anna”})
CREATE (ana) – [: FRIEND] -> (: Person: Expert
{name: “Jennifer”}) – [: WORKED_WITH] -> (neo)

Finding the Shortest Way

MATCH (you {name: “Alen”})
MATCH (expert) – [: WORKED_WITH] -> (db: Database
{name: “Neo4j”})
MATCH path = shortestPath ((you) – [: FRIEND * .. 5] – (expert))
RETURN db, expert, path

This query goes recursively through all of our friends up to a depth of 5 inclusive with a person who is an expert.

As we can see, Neo4j has a built-in shortest path function that uses the Dijkstra algorithm in the background to find the shortest path between nodes. Compared to relational databases, such a search in graph databases takes a much shorter time.

Deleting a node

MATCH (n: Useless)
DELETE n

Deleting relationships

MATCH (n {name: ‘Alen’}) – [r: FRIEND] -> ()
DELETE r

Deleting a node and all its relations

MATCH (n {name: ‘Alen’})
DETACH DELETE n

Modifying and Adding Attributes

MATCH (n {name: ‘Alen’})
SET n.surname = ‘Ibric’ RETURN n

Removing attributes

MATCH (alen {name: ‘Alen’})
REMOVE alen.age
RETURN alen

Creating an index

CREATE INDEX ON: Person (firstname)

Creating a composite index

CREATE INDEX ON: Person (firstname,
surname)

Delete index

DROP INDEX ON: Person (firstname)

List all indexes

CALL db.indexes

An example of using a Neo4j database to detect fraudsters

One way in which the problem with fraudsters manifests is by following the following scenario. A group of fraudsters create a large number of bank accounts at a bank. All accounts are opened using combinations of first and last names, telephone numbers, addresses, social security numbers, etc. After opening, the accounts are used normally so that no one suspects fraud, which includes regular inflows and outflows of money from accounts, answering calls from bank officers, submitting necessary documentation, receiving mail that the bank sends, etc. After a while, all the bills go to the allowed minus and disappear. They are no longer reporting and the bank cannot contact them. The debt is written off and the bank loses a huge amount of money.

In order to demonstrate how Neo4j helps combat this problem, we need to prepare a work environment.

This can be done through an online tool or by installing a Neo4j package on a local machine. Installation instructions can be found here

Once the work environment is set up, a test data set needs to be loaded.

You can do this with the following command:


// Create account holders
CREATE (accountHolder1: AccountHolder {
 FirstName: "John",
 LastName: "Doe",
 UniqueId: "JohnDoe"})

CREATE (accountHolder2: AccountHolder {
 FirstName: "Jane",
 LastName: "Appleseed",
 UniqueId: "JaneAppleseed"})

CREATE (accountHolder3: AccountHolder {
 FirstName: "Matt",
 LastName: "Smith",
 UniqueId: "MattSmith"})

// Create Address
CREATE (address1: Address {
 Street: "123 NW 1st Street",
 City: "San Francisco",
 State: "California",
 ZipCode: "94101"})

// Connect 3 account holders to 1 address
CREATE (accountHolder1) - [: HAS_ADDRESS] -> (address1),
 (accountHolder2) - [: HAS_ADDRESS] -> (address1),
 (accountHolder3) - [: HAS_ADDRESS] -> (address1)

// Create Phone Number
CREATE (phoneNumber1: PhoneNumber {PhoneNumber: "555-555-5555"})

// Connect 2 account holders to 1 phone number
CREATE (accountHolder1) - [: HAS_PHONENUMBER] -> (phoneNumber1),
 (accountHolder2) - [: HAS_PHONENUMBER] -> (phoneNumber1)

// Create SSN
CREATE (ssn1: SSN {SSN: "241-23-1234"})

// Connect 2 account holders to 1 SSN
CREATE (accountHolder2) - [: HAS_SSN] -> (ssn1),
 (accountHolder3) - [: HAS_SSN] -> (ssn1)

// Create SSN and connect 1 account holder
CREATE (ssn2: SSN {SSN: "241-23-4567"}) <- [: HAS_SSN] - (accountHolder1)

// Create Credit Card and connect 1 account holder
CREATE (creditCard1: CreditCard {
 AccountNumber: "1234567890123456",
 Limit: 5000, Balance: 1442.23,
 ExpirationDate: "01-20",
 SecurityCode: "123"}) <- [: HAS_CREDITCARD] - (accountHolder1)

// Create a Bank Account and connect 1 account holder
CREATE (bankAccount1: BankAccount {
 AccountNumber: "2345678901234567",
 Balance: 7054.43}) <- [: HAS_BANKACCOUNT] - (accountHolder1)

// Create Credit Card and connect 1 account holder
CREATE (creditCard2: CreditCard {
 AccountNumber: "1234567890123456",
 Limit: 4000, Balance: 2345.56,
 ExpirationDate: "02-20",
 SecurityCode: "456"}) <- [: HAS_CREDITCARD] - (accountHolder2)

// Create a Bank Account and connect 1 account holder
CREATE (bankAccount2: BankAccount {
 AccountNumber: "3456789012345678",
 Balance: 4231.12}) <- [: HAS_BANKACCOUNT] - (accountHolder2)

// Create Unsecured Loan and connect 1 account holder
CREATE (unsecuredLoan2: UnsecuredLoan {
 AccountNumber: "4567890123456789-0",
 Balance: 9045.53,
 APR: .0541,
 LoanAmount: 12000.00}) <- [: HAS_UNSECUREDLOAN] - (accountHolder2)

// Create a Bank Account and connect 1 account holder
CREATE (bankAccount3: BankAccount {
 AccountNumber: "4567890123456789",
 Balance: 12345.45}) <- [: HAS_BANKACCOUNT] - (accountHolder3)

// Create Unsecured Loan and connect 1 account holder
CREATE (unsecuredLoan3: UnsecuredLoan {
 AccountNumber: "5678901234567890-0",
 Balance: 16341.95, APR: .0341,
 LoanAmount: 22000.00}) <- [: HAS_UNSECUREDLOAN] - (accountHolder3)

// Create Phone Number and connect 1 account holder
CREATE (phoneNumber2: PhoneNumber {
 PhoneNumber: "555-555-1234"}) <- [: HAS_PHONENUMBER] - (accountHolder3)

RETURN *

The result of executing this query is shown in Figure

After we have loaded the test data, we need to find all the account holders to whom some of the data matches.

We will do this with the following query:

MATCH (accountHolder:AccountHolder)- []->(contactInformation)
WITH contactInformation, count(accountHolder) AS RingSize
   MATCH (contactInformation) 1
   RETURN AccountHolders AS FraudRing,
   labels(contactInformation) AS
   ContactType,
   RingSize
   ORDER BY RingSize DESC

Now that we’ve found all the account holders who are potential scammers, we need to find out how much maximum loss each one can bring if they commit fraud.

We will find out as soon as we complete the following query.

MATCH (accountHolder:AccountHolder)-[]->(contactInformation)
WITH contactInformation,
count(accountHolder) AS RingSize
MATCH (contactInformation)(unsecuredAccount)
WITH collect(DISTINCT accountHolder.UniqueId) AS AccountHolders,
    contactInformation, RingSize,
    SUM(CASE type(r)
    WHEN 'HAS_CREDITCARD' THEN unsecuredAccount.LIMIT
    WHEN 'HAS_UNSECUREDLOAN' THEN
    unsecuredAccount.Balance
    ELSE 0
    END) AS FinancialRisk
    WHERE RingSize > 1
    RETURN AccountHolders AS FraudRing,
    labels(contactInformation) AS ContactType,
    RingSize,
    round(FinancialRisk) AS FinancialRisk
    ORDER BY FinancialRisk DESC

The end result of this example can be summarized as the following table

Literature
https://neo4j.com
https://rubygarage.org/blog/neo4j-database-guide-with-use-cases
https://neo4j.com/cypher-graph-query-language/
https://neo4j.com/docs/operations-manual/current/installation/
https://www.tutorialspoint.com/neo4j/neo4j_overview.htm

Likes:
9 0
Views:
1489
Article Categories:
PROGRAMMINGTECHNOLOGY

Leave a Reply

Your email address will not be published. Required fields are marked *